Japanese-InstructBLIP-Alpha Open-Source Vision-Language Model - Generate Japanese Descriptions for Images for Free

Japanese Instructblip Alpha

Developed by stabilityai

A visual-language instruction-following model capable of generating Japanese descriptions for input images with optional text prompts

Image-to-Text

Transformers

JapaneseOpen Source License:Other #Japanese Image Caption Generation #Visual-Language Instruction Following #Multimodal AI

Downloads 141

Release Time : 8/15/2023

Model Overview

Japanese InstructBLIP Alpha is a vision-language model based on the InstructBLIP architecture, specifically optimized for Japanese to generate descriptive content from images and text prompts.

Model Features

Japanese Optimization

Specifically optimized for Japanese to generate high-quality descriptions

Multimodal Input

Supports simultaneous processing of image and text inputs for flexible interaction

Instruction Following

Capable of understanding and following user instructions to generate compliant outputs

Lightweight Training

Only trains the Q-Former component while keeping visual encoder and LLM frozen

Model Capabilities

Image Caption Generation

Visual Question Answering

Multimodal Understanding

Japanese Text Generation

Use Cases

Content Generation

Image Caption Generation

Generates detailed Japanese descriptions for input images

Example: Input a photo of Tokyo Skytree, output '桜と東京スカイツリー' (Cherry blossoms and Tokyo Skytree)

Assistive Tools

Visual Question Answering

Answers specific questions about image content

🚀 Japanese InstructBLIP Alpha

Japanese InstructBLIP Alpha is a vision - language instruction - following model that can generate Japanese descriptions for input images and optional input texts like questions.

🚀 Quick Start

✨ Features

Japanese InstructBLIP Alpha is a vision - language instruction - following model. It has the ability to generate Japanese descriptions for input images and can also handle optional input texts such as questions.

📦 Installation

First, install additional dependencies in requirements.txt:

pip install sentencepiece einops

💻 Usage Examples

Basic Usage

import torch
from transformers import LlamaTokenizer, AutoModelForVision2Seq, BlipImageProcessor
from PIL import Image
import requests

# helper function to format input prompts
def build_prompt(prompt="", sep="\n\n### "):
    sys_msg = "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"
    p = sys_msg
    roles = ["指示", "応答"]
    user_query = "与えられた画像について、詳細に述べてください。"
    msgs = [": \n" + user_query, ": "]
    if prompt:
        roles.insert(1, "入力")
        msgs.insert(1, ": \n" + prompt)
    for role, msg in zip(roles, msgs):
        p += sep + role + msg
    return p

# load model
model = AutoModelForVision2Seq.from_pretrained("stabilityai/japanese-instructblip-alpha", trust_remote_code=True)
processor = BlipImageProcessor.from_pretrained("stabilityai/japanese-instructblip-alpha")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# prepare inputs
url = "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = "" # input empty string for image captioning. You can also input questions as prompts 
prompt = build_prompt(prompt)
inputs = processor(images=image, return_tensors="pt")
text_encoding = tokenizer(prompt, add_special_tokens=False, return_tensors="pt")
text_encoding["qformer_input_ids"] = text_encoding["input_ids"].clone()
text_encoding["qformer_attention_mask"] = text_encoding["attention_mask"].clone()
inputs.update(text_encoding)

# generate
outputs = model.generate(
    **inputs.to(device, dtype=model.dtype),
    num_beam

📄 License

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Access Japanese StableLM Instruct Alpha

By using or distributing any portion or element of the Software Products, you agree to be bound by the following agreement:

JAPANESE STABLELM RESEARCH LICENSE AGREEMENT

Dated: August 7, 2023

"Agreement" means the terms and conditions for use, reproduction, distribution and modification of the Software Products set forth herein.

“Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.

"Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person’s or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.

"Stability AI" or "we" means Stability AI Ltd.

"Software" means, collectively, Stability AI’s proprietary Japanese StableLM made available under this Agreement.

“Software Products” means Software and Documentation.

License Rights and Redistribution:

Subject to your compliance with this Agreement and the Documentation, Stability AI grants you a non - exclusive, worldwide, non - transferable, non - sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Software Products to reproduce, distribute, and create derivative works of the Software Products for purposes other than commercial or production use.
You will not, and will not permit, assist or cause any third party to use, modify, copy, reproduce, create derivative works of, or distribute the Software Products (or any derivative works thereof, works incorporating the Software Products, or any data produced by the Software), in whole or in part, for any commercial or production purposes.
If you distribute or make the Software Products, or any derivative works thereof, available to a third party, you shall (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "Japanese StableLM is licensed under the Japanese StableLM Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.”
The licenses granted to you under this Agreement are conditioned upon your compliance with the Documentation and this Agreement, including the Acceptable Use Policy below and as may be updated from time to time in the future on stability.ai, which is hereby incorporated by reference into this Agreement.

Disclaimer of Warranty: UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON - INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS.

Limitation of Liability: IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.

Intellectual Property:

No trademark licenses are granted under this Agreement, and in connection with the Software Products, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products.
Subject to Stability AI’s ownership of the Software Products and derivatives made by or for Stability AI, with respect to any derivative works and modifications of the Software Products that are made by you, as between you and Stability AI, you are and will be the owner of such derivative works and modifications.
If you institute litigation or other proceedings against Stability AI (including a cross - claim or counterclaim in a lawsuit) alleging that the Software Products or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products in violation of this Agreement.

Term and Termination: The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Software Products. Sections 2 - 4 shall survive the termination of this Agreement.

Japanese StableLM Acceptable Use Policy

If you access, use, or distribute any Stability AI models, software, or other materials (“Stability Technology”) you agree to this Acceptable Use Policy (“Policy”).

We want everyone to use Stability Technology safely and responsibly. You agree you will not use, or allow others to use, Stability Technology to:

To violate the law or others’ rights (including intellectual property rights and the rights of data privacy and protection), nor will you promote, contribute to, encourage, facilitate, plan, incite, or further anyone else’s violation of the law or others’ rights;
To commit, promote, contribute to, facilitate, encourage, plan, incite, or further any of the following:
- Violence or terrorism;
- Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content;
- Human trafficking, exploitation, and sexual violence;
- Harassment, abuse, threatening, stalking, or bullying of individuals or groups of individuals;
- Discrimination in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services on the basis of race, color, caste, religion, sex (including pregnancy, sexual orientation, or gender identity), national origin, age, disability, or genetic information (including family medical history) except as may be required by applicable law (such as the provision of social security benefits solely to people who meet certain age requirements under the law);
- Creation of malicious code, malware, computer viruses or any activity that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system;
For purposes of or for the performance of:
- Fully automated decision - making, including profiling, with respect to an individual or group of individuals which produces legal effects concerning such individual(s) or similarly significantly affects such individual(s);
- Systematic or automated scraping, mining, extraction, or harvesting of personally identifiable data, or similar activity, from the output of any Stability Technology except with respect to data that you have provided as input to the Stability Technology and which you are legally entitled to process, for so long as you retain such entitlement;
- Development, improvement, or manufacture of any weapons of mass destruction (such as nuclear, chemical, or biologic weapons), weapons of war (such as missiles or landmines), or any gain of function - related activities with respect to any pathogens;
- Mission critical applications or systems where best industry practices require fail - safe controls or performance, including operation of nuclear facilities, aircraft navigation, electrical grids, communication systems, water treatment facilities, air traffic control, life support, weapons systems, or emergency locator or other emergency services;
To intentionally deceive or mislead others, including use of Japanese StableLM related to the following:
- Generating, promoting, or furthering fraud or the creation or promotion of disinformation;
- Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content;
- Generating, promoting, or further distributing spam;
- Impersonating another individual without consent, authorization, or legal right
- Representing or misleading people into believing that the use of Japanese StableLM or outputs are human - generated;
- Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement;
- Generating or facilitating large - scale political advertisements, propaganda, or influence campaigns;
Fail to appropriately disclose to end users any known dangers of your AI system or misrepresent or mislead with respect to its abilities.

Nothing in this AUP is intended to prevent or impede any good faith research, testing, or evaluation of Japanese StableLM, or publication related to any of the foregoing. If you discover any flaws in Japanese StableLM that may be harmful to people in any way, we encourage you to notify us and give us a chance to remedy such flaws before others can exploit them. If you have questions about this AUP, contact us at legal@stability.ai.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご