nllb-clip-large-siglip開源多語言視覺語言模型

首頁

Nllb Clip Large Siglip

由visheratin開發

NLLB-CLIP-SigLIP是一個結合了NLLB模型的文本編碼器和SigLIP模型的圖像編碼器的多語言視覺語言模型，支持201種語言。

文本生成圖像 #多語言零樣本分類 #跨模態檢索 #低資源語言支持

下載量 384

發布時間 : 11/14/2023

模型概述

該模型結合了NLLB的文本編碼能力和SigLIP的圖像編碼能力，特別擅長低資源語言的跨模態任務，在Crossmodal-3600數據集上表現優異。

模型特點

多語言支持

支持Flores-200的201種語言，包括許多低資源語言

跨模態能力

結合文本和圖像編碼能力，擅長圖像-文本匹配任務

低資源語言表現

在低資源語言上達到最先進水平

模型能力

多語言圖像分類

跨語言圖像檢索

零樣本學習

使用案例

多語言內容理解

多語言圖像分類

使用不同語言的文本標籤對圖像進行分類

在Crossmodal-3600數據集上表現優異

跨語言圖像檢索

使用不同語言查詢檢索相關圖像

支持201種語言的查詢

🚀 NLLB - CLIP - SigLIP模型

NLLB - CLIP - SigLIP模型結合了NLLB模型的文本編碼器和SigLIP模型的圖像編碼器，能將模型能力拓展至Flores - 200中的201種語言。該模型在Crossmodal - 3600數據集上達到了最先進水平，尤其在低資源語言上表現出色。你可以在論文中瞭解該模型的更多細節。

🚀 快速開始

本模型集成於OpenCLIP中，你可以像使用其他模型一樣使用它。點擊下方按鈕在Colab中打開示例：

📦 安裝指南

首先，你需要安裝open_clip_torch庫：

!pip install -U open_clip_torch

💻 使用示例

基礎用法

from open_clip import create_model_from_pretrained, get_tokenizer
from PIL import Image
import requests
import torch

model, transform = create_model_from_pretrained("nllb-clip-large-siglip", "v1", device="cuda")

tokenizer = get_tokenizer("nllb-clip-large-siglip")

class_options = ["бабочка", "butterfly", "kat"]
class_langs = ["rus_Cyrl", "eng_Latn", "afr_Latn"]

text_inputs = []
for i in range(len(class_options)):
    tokenizer.set_language(class_langs[i])
    text_inputs.append(tokenizer(class_options[i]))
text_inputs = torch.stack(text_inputs).squeeze(1).to("cuda")

image_path = "https://huggingface.co/spaces/jjourney1125/swin2sr/resolve/main/samples/butterfly.jpg"
image = Image.open(requests.get(image_path, stream=True).raw)

image_inputs = transform(image).unsqueeze(0).to("cuda")

with torch.inference_mode():
    logits_per_image, logits_per_text = model.get_logits(image_inputs, text_inputs)

print(logits_per_image.softmax(dim=-1))