InternLM-XComposer2-VL-7B-4Bitオープンソースビジュアル言語モデル

ホーム

Internlm Xcomposer2 Vl 7b 4bit

internlmによって開発

InternLM2ベースの視覚言語大規模モデルで、優れた画像テキスト理解と創作能力を備えています

画像生成テキスト

Transformers

オープンソースライセンス:その他 #テキストと画像の交互創作 #マルチモーダル理解 #視覚言語モデル

ダウンロード数 1,635

リリース時間 : 2/6/2024

モデル概要

書生・浦語XComposer2-VLは、InternLM2を大規模言語モデル基盤とした事前学習視覚言語モデルで、マルチモーダルベンチマークテストで優れた性能を発揮します

モデル特徴

マルチモーダル理解と創作

優れた画像テキスト理解と創作能力を備え、自由なテキストと画像の交互創作をサポートします

量子化バージョン

4ビット量子化バージョンを提供し、計算リソース要件を低減します

高性能

マルチモーダルベンチマークテストで優れた性能を発揮します

モデル能力

画像テキスト理解

画像テキスト創作

マルチモーダルインタラクション

テキスト生成

使用事例

コンテンツ創作

画像記述生成

入力画像に基づいて詳細な記述を生成します

正確で詳細な画像記述を生成します

テキストと画像の交互創作

自由なテキストと画像の交互コンテンツ創作をサポートします

画像とテキストを組み合わせたコンテンツを創作します

視覚的質問応答

画像内容質問応答

画像内容に関する様々な質問に答えます

画像内容を正確に理解し質問に答えます

🚀 InternLM-XComposer2

InternLM-XComposer2 は、InternLM2 をベースにした、高度なテキストと画像の理解および合成を行うためのビジョン言語大規模モデル（VLLM）です。このモデルは、テキストと画像の相互作用を深く理解し、それらを効果的に合成することができます。

InternLM-XComposer2

💻Github Repo

Paper

当社では、InternLM-XComposer2シリーズを2つのバージョンでリリースしています。

InternLM-XComposer2-VL: LLMの初期化としてInternLM2を使用した事前学習済みのVLLMモデルで、様々なマルチモーダルベンチマークで強力な性能を発揮します。
InternLM-XComposer2: 自由形式のテキストと画像の合成 用に微調整されたVLLMです。

これはInternLM-XComposer2-VLの4ビットバージョンです。使用する前に、auto_gptq の最新バージョンをインストールしてください。

🚀 クイックスタート

🤗 Transformersを使用してInternLM-XComposerを使う方法を簡単な例で紹介します。

import torch, auto_gptq
from transformers import AutoModel, AutoTokenizer 
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"]
torch.set_grad_enabled(False)

class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "model.layers"
    outside_layer_modules = [
        'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', 
    ]
    inside_layer_modules = [
        ["attention.wqkv.linear"],
        ["attention.wo.linear"],
        ["feed_forward.w1.linear", "feed_forward.w3.linear"],
        ["feed_forward.w2.linear"],
    ]
 
# init model and tokenizer
model = InternLMXComposer2QForCausalLM.from_quantized(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval()
tokenizer = AutoTokenizer.from_pretrained(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True)

text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast(): 
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) 
print(response)
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." 
#The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. 
#They appear to be hiking or climbing, as one of them is holding a walking stick. 
#The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures.

📄 ライセンス

コードはApache 2.0ライセンスの下で提供されています。一方、モデルの重みは学術研究用に完全にオープンであり、無料の商用利用も許可されています。商用ライセンスを申請するには、申請書（英語）/申請表（中国語）に記入してください。その他の質問やコラボレーションについては、internlm@pjlab.org.cnまでご連絡ください。