nllb-clip-base-siglip開源多語言視覺語言模型

首頁

Nllb Clip Base Siglip

由visheratin開發

NLLB-CLIP-SigLIP 是一個結合了NLLB模型的文本編碼器和SigLIP模型的圖像編碼器的多語言視覺語言模型，支持201種語言。

文本生成圖像 #多語言零樣本分類 #跨模態檢索 #低資源語言處理

下載量 478

發布時間 : 11/14/2023

模型概述

該模型結合了NLLB的文本編碼能力和SigLIP的圖像編碼能力，特別擅長處理低資源語言，在跨模態任務中表現出色。

模型特點

多語言支持

支持Flores-200的201種語言，特別擅長處理低資源語言

跨模態能力

結合了文本和圖像編碼能力，適用於跨模態任務

性能優越

在Crossmodal-3600數據集上設定了最新的技術水平

模型能力

零樣本圖像分類

多語言文本理解

跨模態檢索

使用案例

多語言應用

多語言圖像分類

使用不同語言對圖像進行分類

在多種語言上表現出色

跨模態檢索

圖文匹配

在多語言環境下匹配圖像和文本

在Crossmodal-3600數據集上表現優異

🚀 NLLB - CLIP - SigLIP模型

NLLB - CLIP - SigLIP模型結合了文本編碼器和圖像編碼器的優勢，將模型能力拓展到201種語言，在低資源語言上表現出色，為跨語言圖像分類等任務帶來了新的解決方案。

🚀 快速開始

NLLB - CLIP - SigLIP模型結合了來自 [NLLB模型](https://huggingface.co/facebook/nllb - 200 - distilled - 600M) 的文本編碼器和來自 [SigLIP](https://huggingface.co/timm/ViT - B - 16 - SigLIP - 384) 模型的圖像編碼器。這使我們能夠將模型能力擴展到Flores - 200的201種語言。NLLB - CLIP在 [Crossmodal - 3600](https://google.github.io/crossmodal - 3600/) 數據集上達到了最先進水平，在低資源語言上表現出色。你可以在論文中找到關於該模型的更多詳細信息。

此版本比 [標準](https://huggingface.co/visheratin/nllb - clip - base - oc) 版本表現更好。你可以在這裡和 [這裡](https://github.com/gregor - ge/Babel - ImageNet/blob/main/evaluation_scripts/results_analysis.ipynb) 查看結果。

注意：此模型還有一個 [更好的版本](https://huggingface.co/visheratin/nllb - siglip - mrl - base) 可供使用！

📦 安裝指南

此模型已集成到OpenCLIP中，你可以像使用其他模型一樣使用它。首先，安裝所需的庫：

!pip install -U open_clip_torch

💻 使用示例

基礎用法

你可以點擊下面的按鈕在Colab中打開示例代碼：

以下是使用該模型的示例代碼：

from open_clip import create_model_from_pretrained, get_tokenizer
from PIL import Image
import requests
import torch

model, transform = create_model_from_pretrained("nllb-clip-base-siglip", "v1", device="cuda")

tokenizer = get_tokenizer("nllb-clip-base-siglip")

class_options = ["бабочка", "butterfly", "kat"]
class_langs = ["rus_Cyrl", "eng_Latn", "afr_Latn"]

text_inputs = []
for i in range(len(class_options)):
    tokenizer.set_language(class_langs[i])
    text_inputs.append(tokenizer(class_options[i]))
text_inputs = torch.stack(text_inputs).squeeze(1).to("cuda")

image_path = "https://huggingface.co/spaces/jjourney1125/swin2sr/resolve/main/samples/butterfly.jpg"
image = Image.open(requests.get(image_path, stream=True).raw)

image_inputs = transform(image).unsqueeze(0).to("cuda")

with torch.inference_mode():
    logits_per_image, logits_per_text = model.get_logits(image_inputs, text_inputs)

print(logits_per_image.softmax(dim=-1))

📄 許可證

本模型使用的許可證為 cc - by - nc - 4.0。

🔗 相關信息表格

屬性	詳情
模型類型	NLLB - CLIP - SigLIP
訓練數據	visheratin/laion - coco - nllb
新版本	visheratin/mexma - siglip2
標籤	clip
庫名稱	open_clip
任務類型	zero - shot - image - classification