オープンソースのInternLM-XComposer2-vl-7bモデル - 画像と文章の理解と創作をサポートし、無料でデプロイできる超実用的なモデル！

Home

Internlm Xcomposer2 Vl 7b

Developed by internlm

InternLM-XComposer2はInternLM2を基に開発された視覚-言語大規模モデルで、卓越した画像テキスト理解と創作能力を備えています。

テキスト生成画像

Transformers

Open Source License:Other #画像理解と創作 #マルチモーダル大規模モデル #視覚言語インタラクション

Downloads 1,902

Release Time : 1/25/2024

Model Overview

InternLM-XComposer2は視覚-言語大規模モデルで、VL事前学習モデルと自由形式の画像テキスト創作向けに微調整されたバージョンを含み、多くのマルチモーダル評価で優れた性能を示しています。

Model Features

卓越した画像テキスト理解能力

多くのマルチモーダル評価で優れた性能を示し、画像内容を深く理解できます

自由形式の画像テキスト創作

自由形式の画像テキスト創作向けに最適化され、複雑な画像テキストインタラクションをサポートします

効率的な推論

float16精度でのロードをサポートし、VRAM使用を最適化します

Model Capabilities

画像内容理解

視覚的質問応答

画像テキスト創作

画像キャプション生成

Use Cases

コンテンツ創作

画像キャプション生成

入力画像に基づいて詳細な説明を生成

例では、シーン、雰囲気、深層的な意味を含む画像説明が正常に生成されました

教育

視覚的質問応答

画像内容に関する様々な質問に回答

🚀 InternLM-XComposer2

InternLM-XComposer2 は、InternLM2 をベースにしたビジョン言語大規模モデル（VLLM）で、高度なテキストと画像の理解および合成を実現します。

InternLM-XComposer2

💻Github Repo

論文

InternLM-XComposer2シリーズは2つのバージョンでリリースされています：

InternLM-XComposer2-VL：LLMの初期化としてInternLM2を使用した事前学習済みVLLMモデルで、様々なマルチモーダルベンチマークで強力な性能を発揮します。
InternLM-XComposer2：Free-from Interleaved Text-Image Composition 用に微調整されたVLLM。

🚀 クイックスタート

🤗 Transformersを使用してInternLM-XComposerを使う簡単な例を紹介します。

基本的な使用法

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True)

query = '<ImageHere>Please describe this image in detail.'
image = './image1.webp'
with torch.cuda.amp.autocast():
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
print(response)
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regret,"
# set against a backdrop of a breathtaking sunset. The sky is painted in hues of pink and orange,
# creating a serene atmosphere. Two silhouetted figures stand on a cliff, overlooking the horizon.
# They appear to be hiking or exploring, embodying the essence of the quote.
# The overall scene conveys a sense of adventure and freedom, encouraging viewers to embrace life without hesitation or regrets.

Transformersからのインポート

Transformersを使用してInternLM-XComposer2-VL-7Bモデルをロードするには、以下のコードを使用します。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2-vl-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()

📄 ライセンス

コードはApache 2.0ライセンスの下で提供されています。一方、モデルの重みは学術研究用に完全にオープンであり、商用利用も無料で許可されています。商用ライセンスを申請するには、申請フォーム（英語）/申請表（日本語）に記入してください。その他の質問やコラボレーションについては、internlm@pjlab.org.cnまでご連絡ください。