M

Mme5 Mllama 11b Instruct

Developed by intfloat
mmE5 is a multimodal multilingual embedding model trained on Llama-3.2-11B-Vision, improving embedding performance through high-quality synthetic data and achieving state-of-the-art results on the MMEB benchmark.
Downloads 596
Release Time : 2/13/2025

Model Overview

This model focuses on multimodal (image + text) and multilingual embedding tasks, capable of mapping images and text into a unified embedding space, supporting cross-modal retrieval and similarity calculation.

Model Features

Multimodal Embedding Capability
Capable of processing both image and text inputs, mapping them into a unified embedding space
Multilingual Support
Supports text processing in 8 languages, including English, Chinese, Arabic, etc.
High-quality Synthetic Data Training
Trained with specially designed synthetic data to enhance model performance
State-of-the-art Performance
Achieves state-of-the-art results on the MMEB benchmark

Model Capabilities

Image-text similarity calculation
Cross-modal retrieval
Multilingual text embedding
Zero-shot image classification

Use Cases

Cross-modal Retrieval
Image Search
Retrieve relevant images through text queries
Example query 'a cat and a dog' matches images with a score of 0.4219
Text Search
Retrieve relevant text descriptions through images
Example image matches the text 'a cat and a dog' with a score of 0.4414
Multilingual Applications
Multilingual Image Annotation
Generate multilingual descriptions or labels for images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase