InternLM-XComposer2-VL - 1_8bオープンソースビジュアル言語モデル

Home

Internlm Xcomposer2 Vl 1 8b

Developed by internlm

InternLM2ベースの視覚-言語大規模モデルで、優れた画像理解と創作能力を備えています

テキスト生成画像

Transformers

Open Source License:Other #画像理解と創作 #マルチモーダル大規模モデル #視覚言語インタラクション

Downloads 169

Release Time : 4/9/2024

Model Overview

書生・浦語2はInternLM2ベースの視覚-言語大規模モデル(VLLM)で、複数のマルチモーダルベンチマークテストで優れた性能を示し、画像理解と創作能力を備えています。

Model Features

マルチモーダル理解能力

画像とテキスト情報を同時に処理・理解可能

画像テキスト創作能力

自由形式の画像テキスト交互創作タスクをサポート

高性能

複数のマルチモーダルベンチマークテストで優れた性能

Model Capabilities

画像理解

視覚的質問応答

画像テキスト記述生成

マルチモーダルコンテンツ創作

Use Cases

コンテンツ創作

画像テキストコンテンツ生成

画像に基づいて詳細な記述を生成または関連テキストコンテンツを創作

例ではモデルが画像内容を正確に記述し、中の文字情報を解釈できることが示されています

視覚的質問応答

画像理解と分析

画像内容に関する様々な質問に回答

🚀 InternLM-XComposer2

InternLM-XComposer2 は、InternLM2 をベースにした、高度なテキストと画像の理解および合成を行うためのビジョン言語大規模モデル（VLLM）です。このモデルは、テキストと画像の相互作用を深く理解し、様々なタスクで優れた性能を発揮します。

InternLM-XComposer2

💻Github Repo

論文

当社では、InternLM-XComposer2シリーズを2つのバージョンでリリースしています。

InternLM-XComposer2-VL: InternLM2をLLMの初期化とした事前学習済みのVLLMモデルで、様々なマルチモーダルベンチマークで強力な性能を発揮します。
InternLM-XComposer2: 自由形式のテキストと画像の合成 用に微調整されたVLLM。

🚀 クイックスタート

🤗 Transformersを使用してInternLM-XComposerを使う方法を簡単な例で説明します。

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# モデルとトークナイザーの初期化
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-1_8b', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-1_8b', trust_remote_code=True)

query = '<ImageHere>Please describe this image in detail.'
image = './image1.webp'
with torch.cuda.amp.autocast():
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
print(response)
# The image is a captivating photograph of a sunset over a mountainous landscape. The sky, painted in hues of orange and pink,
# serves as a backdrop for two silhouetted figures standing on the mountain. The text on the image, written in white, is a quote 
# from Oscar Wilde, which reads, "Live life with no excuses, travel with no regret." This quote, combined with the serene setting,
# serves as a powerful reminder to embrace life's journey without hesitation or regret.

💻 使用例

基本的な使用法

Transformersを使ってInternLM-XComposer2-VL-1.8Bモデルをロードするには、以下のコードを使用します。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2-vl-1_8b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# `torch_dtype=torch.float16` を設定して、モデルをfloat16でロードします。そうしないと、float32でロードされ、OOMエラーが発生する可能性があります。
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()

📄 ライセンス

コードはApache 2.0ライセンスの下で公開されており、モデルの重みは学術研究用に完全にオープンであり、商用利用も無料で許可されています。商用ライセンスを申請するには、申請書（英語）/申請表（中国語）に記入してください。その他の質問やコラボレーションについては、internlm@pjlab.org.cnまでご連絡ください。