๐ Heron BLIP Japanese StableLM Base 7B v1
A vision-language model that enables conversations about input images.
๐ Quick Start
Follow the installation guide.
๐ป Usage Examples
Basic Usage
import torch
from heron.models.video_blip import VideoBlipForConditionalGeneration, VideoBlipProcessor
from transformers import LlamaTokenizer
device_id = 0
device = f"cuda:{device_id}"
MODEL_NAME = "turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1"
model = VideoBlipForConditionalGeneration.from_pretrained(
MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)
model = model.half()
model.eval()
model.to(device)
processor = VideoBlipProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['โโ'])
processor.tokenizer = tokenizer
import requests
from PIL import Image
url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)
text = f"##human: ใใฎ็ปๅใฎ้ข็ฝใ็นใฏไฝใงใใ?\n##gpt: "
inputs = processor(
text=text,
images=image,
return_tensors="pt",
truncation=True,
)
inputs = {k: v.to(device) for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(device, torch.float16)
eos_token_id_list = [
processor.tokenizer.pad_token_id,
processor.tokenizer.eos_token_id,
int(tokenizer.convert_tokens_to_ids("##"))
]
with torch.no_grad():
out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., eos_token_id=eos_token_id_list, no_repeat_ngram_size=2)
print(processor.tokenizer.batch_decode(out))
๐ Documentation
Model Details
Heron BLIP Japanese StableLM Base 7B is a vision-language model that can converse about input images. This model was trained using the heron library. Please refer to the code for details.
Training
This model was fully fine-tuned with LLaVA-Instruct-150K-JA.
Training Dataset
Use and Limitations
Intended Use
This model is intended for use in chat-like applications and for research purposes.
Limitations
The model may produce inaccurate or false information, and its accuracy is not guaranteed. It is still in the research and development stage.
How to cite
@misc{BlipJapaneseStableLM,
url = {[https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0)},
title = {Heron BLIP Japanese StableLM Base 7B},
author = {Kotaro Tanahashi, Yuichi Inoue, and Yu Yamaguchi}
}
Citations
@misc{JapaneseInstructBLIPAlpha,
url = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)},
title = {Japanese InstructBLIP Alpha},
author = {Shing, Makoto and Akiba, Takuya}
}
๐ License
This project is licensed under the CC BY-NC 4.0 license.