O

Omniembed V0.1

Developed by Tevatron
A multimodal embedding model based on Qwen2.5-Omni-7B, supporting unified embedding representations for cross-lingual text, images, audio, and video
Downloads 2,190
Release Time : 4/12/2025

Model Overview

OmniEmbed is a multimodal embedding model capable of generating unified embedding representations for cross-lingual text, images, audio, and video, providing efficient cross-modal retrieval capabilities for diverse applications.

Model Features

Unified Multimodal Embedding
Supports unified embedding representations for text, images, audio, and video, enabling cross-modal retrieval
Cross-lingual Capability
Supports multilingual text retrieval with performance close to specialized multilingual retrieval models
High-performance Retrieval
Performs excellently on multiple benchmarks, comparable to specialized single-modal models
Open-source Training
Training data and code have been fully open-sourced on Tevatron

Model Capabilities

Text Retrieval
Image Document Retrieval
Video Retrieval
Audio Retrieval
Multilingual Retrieval

Use Cases

Multimedia Retrieval
Video Retrieval
Retrieve relevant video content based on text queries
Achieves R@1 of 51.3 on MSRVTT dataset, outperforming CLIP baseline
Audio Retrieval
Retrieve relevant audio clips based on text descriptions
Achieves R@1 of 34.0 on AudioCaps dataset, surpassing existing baselines
Document Retrieval
Image Document Retrieval
Retrieve relevant information from documents containing images/charts
Achieves nDCG@5 of 85.8 on VIDORE dataset
Multilingual Retrieval
Cross-lingual text retrieval
Achieves nDCG@10 of 69.1 on MIRACL dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase