N

Nllb Siglip Mrl Base

Developed by visheratin
A multilingual vision-language model combining NLLB text encoder and SigLIP image encoder, supporting 201 languages and multiple embedding dimensions
Downloads 352
Release Time : 2/22/2024

Model Overview

This model integrates the text encoding capability of NLLB and the image encoding capability of SigLIP, supporting 201 languages from Flores-201, and employs nested representation learning to generate embeddings of various dimensions.

Model Features

Multilingual support
Supports 201 languages from Flores-201, extending the model's multilingual capabilities
Variable embedding dimensions
Uses nested representation learning to generate embeddings of 32/64/128/256/512 dimensions
High-performance retrieval
Establishes new SOTA for multilingual image-text retrieval on XTD10 and Crossmodal-3600 datasets

Model Capabilities

Multilingual image classification
Cross-modal retrieval
Zero-shot learning
Variable resolution embedding

Use Cases

Multilingual content understanding
Multilingual image classification
Classify images using text labels in different languages
Cross-modal retrieval
Image-text retrieval
Perform mutual retrieval between images and texts in multilingual environments
Achieves SOTA performance on XTD10 and Crossmodal-3600 datasets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase