N

Nllb Clip Large Siglip

Developed by visheratin
NLLB-CLIP-SigLIP is a multilingual vision-language model that combines the text encoder of the NLLB model and the image encoder of the SigLIP model, supporting 201 languages.
Downloads 384
Release Time : 11/14/2023

Model Overview

This model integrates the text encoding capability of NLLB and the image encoding capability of SigLIP, excelling particularly in cross-modal tasks for low-resource languages and demonstrating outstanding performance on the Crossmodal-3600 dataset.

Model Features

Multilingual support
Supports 201 languages from Flores-200, including many low-resource languages
Cross-modal capability
Combines text and image encoding abilities, excelling in image-text matching tasks
Low-resource language performance
Achieves state-of-the-art performance on low-resource languages

Model Capabilities

Multilingual image classification
Cross-lingual image retrieval
Zero-shot learning

Use Cases

Multilingual content understanding
Multilingual image classification
Classify images using text labels in different languages
Outstanding performance on the Crossmodal-3600 dataset
Cross-lingual image retrieval
Retrieve relevant images using queries in different languages
Supports queries in 201 languages
Featured Recommended AI Models
ยฉ 2025AIbase