N

Nllb Clip Base Oc

Developed by visheratin
NLLB-CLIP is a multilingual vision-language model combining the NLLB text encoder with the CLIP image encoder, supporting 201 languages
Downloads 371
Release Time : 10/7/2023

Model Overview

This model integrates the text encoding capabilities of NLLB with the image encoding capabilities of CLIP, extending multilingual vision-language understanding, particularly excelling in low-resource languages

Model Features

Multilingual support
Supports 201 languages from Flores-200, including many low-resource languages
Cross-modal understanding
Combines text and image encoding capabilities to achieve vision-language alignment
Low-resource language optimization
Achieves state-of-the-art results on low-resource languages

Model Capabilities

Multilingual image classification
Cross-modal retrieval
Zero-shot learning

Use Cases

Multilingual content understanding
Multilingual image captioning
Generates descriptive labels for images in multiple languages
Performs excellently on the Crossmodal-3600 dataset
Cross-language image search
Retrieves relevant images using queries in different languages
Low-resource language applications
Low-resource language image classification
Classifies images in low-resource language environments
Achieves SOTA performance on low-resource languages
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase