mplug-owl-llama-7b開源多模態大模型 - 支持圖像理解與文本生成任務

首頁

Mplug Owl Llama 7b

由MAGAer13開發

mPLUG-Owl是一個多模態大語言模型，基於LLaMA-7B架構，支持圖像理解和文本生成任務。

圖像生成文本

Transformers

英語開源協議:Apache-2.0 #多模態對話 #圖像理解 #表情包分析

下載量 327

發布時間 : 5/8/2023

模型概述

該模型結合視覺與語言處理能力，能夠理解圖像內容並生成相關文本描述或回答問題，適用於多模態交互場景。

模型特點

多模態理解

同時處理圖像和文本輸入，實現跨模態內容理解

對話式交互

支持多輪對話模板，可進行自然語言交互

開放域應用

適用於開放域視覺問答和圖像描述生成

模型能力

圖像內容理解

視覺問答

表情包分析

多輪對話生成

跨模態推理

使用案例

社交媒體分析

表情包解讀

分析網絡表情包的幽默元素和文化背景

生成符合人類認知的幽默解釋

輔助創作

圖像描述生成

為視覺內容自動生成說明文字

生成準確且符合語境的文本描述

🚀 mPLUG - Owl

mPLUG - Owl是一個圖像到文本的模型，可用於根據圖像生成文本描述等任務，在圖像理解和文本生成方面具有強大的能力。

🚀 快速開始

📦 安裝指南

從Github獲取最新的代碼庫：

git clone https://github.com/X-PLUG/mPLUG-Owl.git

💻 使用示例

基礎用法

模型初始化

from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
    pretrained_ckpt,
    torch_dtype=torch.bfloat16,
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)

模型推理

準備模型輸入：

# 我們使用人類/AI模板將上下文組織成多輪對話。
# <image> 表示一個圖像佔位符。
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: Explain why this meme is funny.
AI: ''']

# 圖像路徑應放在image_list中，並與提示中的順序保持一致。
# 我們支持URL、本地文件路徑和base64字符串。你可以通過修改mplug_owl.modeling_mplug_owl.ImageProcessor來自定義圖像的預處理。
image_list = ['https://xxx.com/image.jpg']

獲取響應：

# generate kwargs（與transformers中相同）可以傳遞給do_generate()
generate_kwargs = {
    'do_sample': True,
    'top_k': 5,
    'max_length': 512
}
from PIL import Image
images = [Image.open(_) for _ in image_list]
inputs = processor(text=prompts, images=images, return_tensors='pt')
inputs = {k: v.bfloat16() if v.dtype == torch.float else v for k, v in inputs.items()}
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
    res = model.generate(**inputs, **generate_kwargs)
sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True)
print(sentence)