Nllb Clip Base Siglip
NLLB-CLIP-SigLIP is a multilingual vision-language model that combines the text encoder from NLLB and the image encoder from SigLIP, supporting 201 languages.
Downloads 478
Release Time : 11/14/2023
Model Overview
This model integrates the text encoding capabilities of NLLB and the image encoding capabilities of SigLIP, excelling particularly in low-resource languages and performing outstandingly in cross-modal tasks.
Model Features
Multilingual support
Supports 201 languages from Flores-200, with particular strength in low-resource languages
Cross-modal capability
Combines text and image encoding capabilities, suitable for cross-modal tasks
Superior performance
Sets the latest state-of-the-art performance on the Crossmodal-3600 dataset
Model Capabilities
Zero-shot image classification
Multilingual text understanding
Cross-modal retrieval
Use Cases
Multilingual applications
Multilingual image classification
Classify images using different languages
Performs excellently across multiple languages
Cross-modal retrieval
Image-text matching
Match images and texts in a multilingual environment
Performs exceptionally well on the Crossmodal-3600 dataset
Featured Recommended AI Models
ยฉ 2025AIbase