Heron Chat Git Ja StableLM Base 7B V1 Open-Source Vision-Language Model - Supports Japanese and Image-Interactive Dialogue

Home

Heron Chat Git Ja Stablelm Base 7b V1

Developed by turing-motors

A vision-language model capable of conversing about input images, supporting Japanese interaction

Image-to-Text

Transformers

Japanese#Japanese Visual Dialogue #Image Caption Generation #Multimodal Q&A

Downloads 54

Release Time : 3/29/2024

Model Overview

This model is a vision-language model based on GIT architecture, capable of understanding image content and conducting Japanese dialogues. Primarily used for image caption generation and visual question answering tasks.

Model Features

Vision-Language Understanding

Capable of understanding image content and generating relevant textual descriptions

Japanese Dialogue Capability

Dialogue generation capability specifically optimized for Japanese

End-to-End Training

Joint training of visual encoder and language model to enhance comprehension

Model Capabilities

Image understanding

Japanese dialogue

Visual question answering

Image caption generation

Use Cases

Chat Applications

Image-based Dialogue

Users upload images and engage in dialogue with the model about the image content

The model can understand image content and generate relevant responses

Assistive Tools

Image Caption Generation

Generates textual descriptions of images for visually impaired users

Provides accurate descriptions of image content

🚀 Heron GIT Japanese StableLM Base 7B

A vision - language model that can converse about input images, offering capabilities in chat - like applications and research.

🚀 Quick Start

Follow the installation guide.

💻 Usage Examples

Basic Usage

import torch
from heron.models.git_llm.git_japanese_stablelm_alpha import GitJapaneseStableLMAlphaForCausalLM
from transformers import AutoProcessor, LlamaTokenizer

device_id = 0
device = f"cuda:{device_id}"

MODEL_NAME = "turing-motors/heron-chat-git-ja-stablelm-base-7b-v1"
    
model = GitJapaneseStableLMAlphaForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)
model.eval()
model.to(device)

# prepare a processor
processor = AutoProcessor.from_pretrained(MODEL_NAME)
tokenizer = LlamaTokenizer.from_pretrained(
    "novelai/nerdstash-tokenizer-v1",
    padding_side="right",
    additional_special_tokens=["▁▁"],
)
processor.tokenizer = tokenizer


import requests
from PIL import Image

# prepare inputs
url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)

text = f"##human: この画像の面白い点は何ですか?\n##gpt: "

# do preprocessing
inputs = processor(
    text=text,
    images=image,
    return_tensors="pt",
    truncation=True,
)

inputs = {k: v.to(device) for k, v in inputs.items()}

# do inference
with torch.no_grad():
    out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., no_repeat_ngram_size=2)

# print result
print(processor.tokenizer.batch_decode(out))

📚 Documentation

Model Details

Heron GIT Japanese StableLM Base 7B is a vision - language model that can converse about input images. This model was trained using the heron library. Please refer to the code for details.

Property	Details
Developed by	Turing Inc.
Adaptor type	GIT
Language Model	Japanese StableLM Base Alpha
Languages	Japanese

Training

The GIT adaptor was trained with LLaVA - Pratrain - JA.
The LLM and the adapter were fully fine - tuned with LLaVA - Instruct - 620K - JA - v2.

Training Dataset

LLaVA - Pratrain - JA
LLaVA - Instruct - 620K - JA - v2

Intended Use

This model is intended for use in chat - like applications and for research purposes.

Limitations

The model may produce inaccurate or false information, and its accuracy is not guaranteed. It is still in the research and development stage.

📄 License

This project is licensed under the CC BY - NC 4.0 license.

📚 How to cite

@misc{inoue2024heronbench,
      title={Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese}, 
      author={Yuichi Inoue and Kento Sasaki and Yuma Ochi and Kazuki Fujii and Kotaro Tanahashi and Yu Yamaguchi},
      year={2024},
      eprint={2404.07824},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご