heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k Open Source Model - A Magical Tool for Image-Dialogue Interaction in Japanese

Home

Heron Chat Blip Ja Stablelm Base 7b V1 Llava 620k

Developed by turing-motors

A vision-language model capable of conversing about input images, supporting Japanese interaction

Image-to-Text

Transformers

Japanese#Japanese Visual Question Answering #Image Dialogue Generation #Multimodal Japanese Processing

Downloads 25

Release Time : 2/27/2024

Model Overview

This model is based on the BLIP2 architecture combined with the Japanese StableLM Base Alpha language model, capable of processing image inputs and conducting natural language conversations

Model Features

Japanese Visual Dialogue

Visual question answering capability specifically optimized for Japanese

Efficient Architecture

Combines BLIP2 visual encoder with StableLM language model

Comprehensive Fine-tuning

Trained using the LLaVA-Instruct-620K-JA dataset

Model Capabilities

Image Understanding

Japanese Conversation

Visual Question Answering

Image Caption Generation

Use Cases

Chat Applications

Image Chatbot

Users upload images and converse with AI about the image content

Capable of understanding image content and generating relevant responses

Research Applications

Multimodal Research

Used for vision-language model related research

🚀 Heron BLIP Japanese StableLM Base 7B llava-620k

A vision-language model that can converse about input images, enabling engaging interactions with visual content.

🚀 Quick Start

Follow the installation guide.

💻 Usage Examples

Basic Usage

import torch
from heron.models.video_blip import VideoBlipForConditionalGeneration, VideoBlipProcessor
from transformers import LlamaTokenizer

device_id = 0
device = f"cuda:{device_id}"

MODEL_NAME = "turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1"
    
model = VideoBlipForConditionalGeneration.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)

model = model.half()
model.eval()
model.to(device)

# prepare a processor
processor = VideoBlipProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
processor.tokenizer = tokenizer

import requests
from PIL import Image

# prepare inputs
url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)

text = f"##human: この画像の面白い点は何ですか?\n##gpt: "

# do preprocessing
inputs = processor(
    text=text,
    images=image,
    return_tensors="pt",
    truncation=True,
)

inputs = {k: v.to(device) for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(device, torch.float16)

# set eos token
eos_token_id_list = [
    processor.tokenizer.pad_token_id,
    processor.tokenizer.eos_token_id,
    int(tokenizer.convert_tokens_to_ids("##"))
]

# do inference
with torch.no_grad():
    out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., eos_token_id=eos_token_id_list, no_repeat_ngram_size=2)

# print result
print(processor.tokenizer.batch_decode(out))

📚 Documentation

Model Details

Heron BLIP Japanese StableLM Base 7B is a vision-language model that can converse about input images. This model was trained using the heron library. Please refer to the code for details.

Property	Details
Developed by	Turing Inc.
Adaptor type	BLIP2
Language Model	Japanese StableLM Base Alpha
Language(s)	Japanese

Training

This model was fully fine-tuned with LLaVA-Instruct-620K-JA.

Training Dataset

LLaVA-Instruct-620K-JA

Use and Limitations

Intended Use

This model is intended for use in chat-like applications and for research purposes.

Limitations

The model may produce inaccurate or false information, and its accuracy is not guaranteed. It is still in the research and development stage.

How to cite

@misc{BlipJapaneseStableLM, 
    url    = {[https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0)}, 
    title  = {Heron BLIP Japanese StableLM Base 7B}, 
    author = {Kotaro Tanahashi, Yuichi Inoue, and Yu Yamaguchi}
}

Citations

@misc{JapaneseInstructBLIPAlpha, 
    url    = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)}, 
    title  = {Japanese InstructBLIP Alpha}, 
    author = {Shing, Makoto and Akiba, Takuya}
}

📄 License

This project is licensed under the CC BY-NC 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご