heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k開源模型 - 支持日語的圖像對話交互神器

首頁

Heron Chat Blip Ja Stablelm Base 7b V1 Llava 620k

由turing-motors開發

一個能夠就輸入圖像進行對話的視覺語言模型，支持日語交互

圖像生成文本

Transformers

日語#日語視覺問答 #圖像對話生成 #多模態日語處理

下載量 25

發布時間 : 2/27/2024

模型概述

該模型基於BLIP2架構，結合日語StableLM基礎Alpha版語言模型，能夠處理圖像輸入並進行自然語言對話

模型特點

日語視覺對話

專門針對日語優化的視覺問答能力

高效架構

結合BLIP2視覺編碼器和StableLM語言模型

全面微調

使用LLaVA-Instruct-620K-JA數據集進行訓練

模型能力

圖像理解

日語對話

視覺問答

圖像描述生成

使用案例

聊天應用

圖像對話機器人

用戶上傳圖片後與AI進行關於圖片內容的對話

能夠理解圖片內容並生成相關回答

研究用途

多模態研究

用於視覺語言模型相關研究

🚀 Heron BLIP 日語 StableLM Base 7B llava - 620k

Heron BLIP 日語 StableLM Base 7B 是一款視覺語言模型，可針對輸入圖像進行對話交流，為圖像相關的交互提供了強大的支持。

🚀 快速開始

按照安裝指南進行操作。

💻 使用示例

基礎用法

import torch
from heron.models.video_blip import VideoBlipForConditionalGeneration, VideoBlipProcessor
from transformers import LlamaTokenizer

device_id = 0
device = f"cuda:{device_id}"

MODEL_NAME = "turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1"
    
model = VideoBlipForConditionalGeneration.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)

model = model.half()
model.eval()
model.to(device)

# prepare a processor
processor = VideoBlipProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
processor.tokenizer = tokenizer

import requests
from PIL import Image

# prepare inputs
url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)

text = f"##human: この畫像の面白い點は何ですか?\n##gpt: "

# do preprocessing
inputs = processor(
    text=text,
    images=image,
    return_tensors="pt",
    truncation=True,
)

inputs = {k: v.to(device) for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(device, torch.float16)

# set eos token
eos_token_id_list = [
    processor.tokenizer.pad_token_id,
    processor.tokenizer.eos_token_id,
    int(tokenizer.convert_tokens_to_ids("##"))
]

# do inference
with torch.no_grad():
    out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., eos_token_id=eos_token_id_list, no_repeat_ngram_size=2)

# print result
print(processor.tokenizer.batch_decode(out))

📚 詳細文檔

模型詳情

開發者：圖靈公司
適配器類型：BLIP2
語言模型：日語 StableLM Base Alpha
支持語言：日語

訓練情況

此模型使用 LLaVA - Instruct - 620K - JA 進行了全量微調。

訓練數據集

LLaVA - Instruct - 620K - JA

使用與限制

預期用途

該模型旨在用於類似聊天的應用程序以及研究目的。

侷限性

模型可能會產生不準確或錯誤的信息，其準確性無法保證，目前仍處於研發階段。

如何引用

@misc{BlipJapaneseStableLM, 
    url    = {[https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0)}, 
    title  = {Heron BLIP Japanese StableLM Base 7B}, 
    author = {Kotaro Tanahashi, Yuichi Inoue, and Yu Yamaguchi}
}

引用文獻

@misc{JapaneseInstructBLIPAlpha, 
    url    = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)}, 
    title  = {Japanese InstructBLIP Alpha}, 
    author = {Shing, Makoto and Akiba, Takuya}
}