mplug-owl-llama-7bオープンソース多モーダル大規模モデル - 画像理解とテキスト生成タスクをサポート

ホーム

Mplug Owl Llama 7b

MAGAer13によって開発

mPLUG-OwlはLLaMA-7Bアーキテクチャに基づくマルチモーダル大規模言語モデルで、画像理解とテキスト生成タスクをサポートします。

画像生成テキスト

Transformers

英語オープンソースライセンス:Apache-2.0 #マルチモーダル対話 #画像理解 #スタンプ分析

ダウンロード数 327

リリース時間 : 5/8/2023

モデル概要

このモデルは視覚と言語処理能力を統合し、画像内容を理解して関連するテキスト記述を生成したり質問に答えたりすることができ、マルチモーダルインタラクションシナリオに適しています。

モデル特徴

マルチモーダル理解

画像とテキスト入力を同時に処理し、クロスモーダルコンテンツ理解を実現

対話型インタラクション

マルチターン対話テンプレートをサポートし、自然言語インタラクションが可能

オープンドメインアプリケーション

オープンドメインの視覚的質問応答や画像記述生成に適用可能

モデル能力

画像内容理解

視覚的質問応答

スタンプ分析

マルチターン対話生成

クロスモーダル推論

使用事例

ソーシャルメディア分析

スタンプ解釈

ネットスタンプのユーモア要素や文化的背景を分析

人間の認知に合致するユーモア解釈を生成

創作支援

画像記述生成

視覚コンテンツに自動的に説明文を生成

正確で文脈に合致したテキスト記述を生成

🚀 mPLUG - Owl

mPLUG - Owlは、画像からテキストへの変換を行うモデルです。このプロジェクトでは、mPLUG - Owlの使用方法やモデルの初期化、推論の手順を提供しています。

🚀 クイックスタート

GitHubから最新のコードベースを取得する

git clone https://github.com/X-PLUG/mPLUG-Owl.git

モデルの初期化

from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
    pretrained_ckpt,
    torch_dtype=torch.bfloat16,
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)

モデルの推論

モデルの入力を準備します。

# 人間とAIのテンプレートを使用して、コンテキストをマルチターンの会話として整理します。
# <image> は画像のプレースホルダーを表します。
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: Explain why this meme is funny.
AI: ''']

# 画像のパスはimage_listに配置し、プロンプトと同じ順序に保つ必要があります。
# URL、ローカルファイルパス、Base64文字列がサポートされています。mplug_owl.modeling_mplug_owl.ImageProcessorを変更することで、画像の前処理をカスタマイズできます。
image_list = ['https://xxx.com/image.jpg']

応答を取得します。

# generate kwargs (transformersと同じ) をdo_generate()に渡すことができます。
generate_kwargs = {
    'do_sample': True,
    'top_k': 5,
    'max_length': 512
}
from PIL import Image
images = [Image.open(_) for _ in image_list]
inputs = processor(text=prompts, images=images, return_tensors='pt')
inputs = {k: v.bfloat16() if v.dtype == torch.float else v for k, v in inputs.items()}
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
    res = model.generate(**inputs, **generate_kwargs)
sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True)
print(sentence)