japanese-cloob-vit-b-16开源模型 - 助力日语图像与文本跨模态理解

首页

Japanese Cloob Vit B 16

由 rinna 开发

由rinna株式会社训练的日语CLOOB（对比留一增强）模型，用于图像与文本的跨模态理解

文本生成图像

Transformers

日语开源协议:Apache-2.0 #日语多模态 #图像文本匹配 #零样本分类

下载量 229.51k

发布时间 : 4/27/2022

模型简介

该模型基于CLOOB架构，能够理解日语文本与图像之间的关联，支持图像分类和文本-图像匹配等任务

模型特点

日语跨模态理解

专门针对日语设计的视觉-语言模型，能有效理解日语文本与图像的关联

CLOOB架构

采用对比留一增强(CLOOB)方法，提升跨模态表示学习效果

预训练ViT模型

图像编码器基于AugReg vit-base-patch16-224模型初始化

模型能力

图像特征提取

文本特征提取

图像-文本匹配

跨模态检索

使用案例

图像分类

动物图像分类

识别图像中的动物类别（如犬、猫、象）

示例显示对犬类图像分类准确率达100%

跨模态检索

文本到图像检索

根据日语文本描述检索相关图像

🚀 rinna/japanese-cloob-vit-b-16

这是由 rinna株式会社训练的日语 CLOOB（对比留一法增强）模型。

你可以查看 japanese-clip 以了解其他可用模型。

🚀 快速开始

📦 安装指南

安装所需的包：

$ pip install git+https://github.com/rinnakk/japanese-clip.git

💻 使用示例

基础用法

import io
import requests
from PIL import Image
import torch
import japanese_clip as ja_clip

device = "cuda" if torch.cuda.is_available() else "cpu"


model, preprocess = ja_clip.load("rinna/japanese-cloob-vit-b-16", device=device)
tokenizer = ja_clip.load_tokenizer()

img = Image.open(io.BytesIO(requests.get('https://images.pexels.com/photos/2253275/pexels-photo-2253275.jpeg?auto=compress&cs=tinysrgb&dpr=3&h=750&w=1260').content))
image = preprocess(img).unsqueeze(0).to(device)
encodings = ja_clip.tokenize(
    texts=["犬", "猫", "象"],
    max_seq_len=77,
    device=device,
    tokenizer=tokenizer, # this is optional. if you don't pass, load tokenizer each time
)

with torch.no_grad():
    image_features = model.get_image_features(image)
    text_features = model.get_text_features(**encodings)
    
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1.0, 0.0, 0.0]]

🔧 技术细节

模型架构

该模型使用ViT-B/16 Transformer架构作为图像编码器，并使用12层的BERT作为文本编码器。图像编码器是从 AugReg vit-base-patch16-224 模型初始化而来的。

训练数据

该模型在 CC12M 上进行训练，其标题已被翻译成日语。

发布日期

2022年5月12日

如何引用

@misc{rinna-japanese-cloob-vit-b-16,
    title = {rinna/japanese-cloob-vit-b-16},
    author = {Shing, Makoto and Zhao, Tianyu and Sawada, Kei},
    url = {https://huggingface.co/rinna/japanese-cloob-vit-b-16}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}