L

Llave 2B

Developed by zhibinlan
LLaVE-2B is a 2-billion-parameter multimodal embedding model based on Aquila-VL-2B, featuring a 4K token context window and supporting embeddings for text, images, multiple images, and videos.
Downloads 20.05k
Release Time : 2/9/2025

Model Overview

LLaVE-2B is a multimodal embedding model capable of generating embeddings for text, images, multiple images, and videos, suitable for tasks like sentence similarity and zero-shot image classification.

Model Features

Multimodal Embedding
Supports embedding representations for text, images, multiple images, and videos, capable of handling diverse data modalities.
4K Token Context Window
Features a 4K token context window, enabling the processing of longer input sequences.
Zero-shot Image Classification
Capable of performing image classification tasks in a zero-shot setting without requiring additional training data.
Strong Transfer Learning Capability
Despite being trained on image-text data, it generalizes well to text-video retrieval tasks, demonstrating excellent performance.

Model Capabilities

Text embedding
Image embedding
Video embedding
Multimodal embedding
Sentence similarity calculation
Zero-shot image classification
Video-text retrieval

Use Cases

Image Retrieval
Image-Text Retrieval
Retrieve relevant images based on text descriptions
Achieved outstanding performance on the MMEB leaderboard
Video Retrieval
Zero-shot Video-Text Retrieval
Retrieve relevant videos based on text descriptions
Demonstrated excellent performance, showcasing potential for transfer to other embedding tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase