Open-source model of heron-chat-blip-ja-stablelm-base-7b-v1 - Supports Japanese and enables conversations about images

Home

Heron Chat Blip Ja Stablelm Base 7b V1

Developed by turing-motors

This is a vision-language model capable of engaging in dialogue about input images, supporting Japanese communication.

Image-to-Text

Transformers

Japanese#Japanese Visual Question Answering #Image Dialogue Generation #Multimodal Japanese Processing

Downloads 40

Release Time : 2/20/2024

Model Overview

The model is based on the BLIP2 architecture, combined with the Japanese StableLM language model, enabling it to understand and generate Japanese descriptions and dialogues about input images.

Model Features

Japanese Visual Dialogue

A vision-language model specifically optimized for Japanese, capable of conducting image-related dialogues in Japanese.

BLIP2 Architecture

Utilizes the BLIP2 vision-language pre-training framework, combining powerful visual encoders and language models.

StableLM Base

Based on StabilityAI's Japanese StableLM base model, providing stable language generation capabilities.

Model Capabilities

Image Caption Generation

Visual Question Answering

Japanese Dialogue

Image Understanding

Use Cases

Chat Applications

Image Dialogue Bot

After uploading an image, users can discuss the image content in Japanese with the bot.

Capable of understanding image content and generating relevant Japanese responses.

Assistive Tools

Visual Assistance

Provides Japanese descriptions of image content for visually impaired individuals.

Helps visually impaired users understand their surroundings.

🚀 Heron BLIP Japanese StableLM Base 7B v1

A vision-language model that enables conversations about input images.

🚀 Quick Start

Follow the installation guide.

💻 Usage Examples

Basic Usage

import torch
from heron.models.video_blip import VideoBlipForConditionalGeneration, VideoBlipProcessor
from transformers import LlamaTokenizer

device_id = 0
device = f"cuda:{device_id}"

MODEL_NAME = "turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1"
    
model = VideoBlipForConditionalGeneration.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)

model = model.half()
model.eval()
model.to(device)

# prepare a processor
processor = VideoBlipProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
processor.tokenizer = tokenizer

import requests
from PIL import Image

# prepare inputs
url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)

text = f"##human: この画像の面白い点は何ですか?\n##gpt: "

# do preprocessing
inputs = processor(
    text=text,
    images=image,
    return_tensors="pt",
    truncation=True,
)

inputs = {k: v.to(device) for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(device, torch.float16)

# set eos token
eos_token_id_list = [
    processor.tokenizer.pad_token_id,
    processor.tokenizer.eos_token_id,
    int(tokenizer.convert_tokens_to_ids("##"))
]

# do inference
with torch.no_grad():
    out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., eos_token_id=eos_token_id_list, no_repeat_ngram_size=2)

# print result
print(processor.tokenizer.batch_decode(out))

📚 Documentation

Model Details

Heron BLIP Japanese StableLM Base 7B is a vision-language model that can converse about input images. This model was trained using the heron library. Please refer to the code for details.

Property	Details
Developed by	Turing Inc.
Adaptor type	BLIP2
Language Model	Japanese StableLM Base Alpha
Language(s)	Japanese

Training

This model was fully fine-tuned with LLaVA-Instruct-150K-JA.

Training Dataset

LLaVA-Instruct-150K-JA

Use and Limitations

Intended Use

This model is intended for use in chat-like applications and for research purposes.

Limitations

The model may produce inaccurate or false information, and its accuracy is not guaranteed. It is still in the research and development stage.

How to cite

@misc{BlipJapaneseStableLM, 
    url    = {[https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0)}, 
    title  = {Heron BLIP Japanese StableLM Base 7B}, 
    author = {Kotaro Tanahashi, Yuichi Inoue, and Yu Yamaguchi}
}

Citations

@misc{JapaneseInstructBLIPAlpha, 
    url    = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)}, 
    title  = {Japanese InstructBLIP Alpha}, 
    author = {Shing, Makoto and Akiba, Takuya}
}

📄 License

This project is licensed under the CC BY-NC 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご