mplug-owl-llama-7b开源多模态大模型 - 支持图像理解与文本生成任务

首页

Mplug Owl Llama 7b

由 MAGAer13 开发

mPLUG-Owl是一个多模态大语言模型，基于LLaMA-7B架构，支持图像理解和文本生成任务。

图像生成文本

Transformers

英语开源协议:Apache-2.0 #多模态对话 #图像理解 #表情包分析

下载量 327

发布时间 : 5/8/2023

模型简介

该模型结合视觉与语言处理能力，能够理解图像内容并生成相关文本描述或回答问题，适用于多模态交互场景。

模型特点

多模态理解

同时处理图像和文本输入，实现跨模态内容理解

对话式交互

支持多轮对话模板，可进行自然语言交互

开放域应用

适用于开放域视觉问答和图像描述生成

模型能力

图像内容理解

视觉问答

表情包分析

多轮对话生成

跨模态推理

使用案例

社交媒体分析

表情包解读

分析网络表情包的幽默元素和文化背景

生成符合人类认知的幽默解释

辅助创作

图像描述生成

为视觉内容自动生成说明文字

生成准确且符合语境的文本描述

🚀 mPLUG - Owl

mPLUG - Owl是一个图像到文本的模型，可用于根据图像生成文本描述等任务，在图像理解和文本生成方面具有强大的能力。

🚀 快速开始

📦 安装指南

从Github获取最新的代码库：

git clone https://github.com/X-PLUG/mPLUG-Owl.git

💻 使用示例

基础用法

模型初始化

from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
    pretrained_ckpt,
    torch_dtype=torch.bfloat16,
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)

模型推理

准备模型输入：

# 我们使用人类/AI模板将上下文组织成多轮对话。
# <image> 表示一个图像占位符。
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: Explain why this meme is funny.
AI: ''']

# 图像路径应放在image_list中，并与提示中的顺序保持一致。
# 我们支持URL、本地文件路径和base64字符串。你可以通过修改mplug_owl.modeling_mplug_owl.ImageProcessor来自定义图像的预处理。
image_list = ['https://xxx.com/image.jpg']

获取响应：

# generate kwargs（与transformers中相同）可以传递给do_generate()
generate_kwargs = {
    'do_sample': True,
    'top_k': 5,
    'max_length': 512
}
from PIL import Image
images = [Image.open(_) for _ in image_list]
inputs = processor(text=prompts, images=images, return_tensors='pt')
inputs = {k: v.bfloat16() if v.dtype == torch.float else v for k, v in inputs.items()}
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
    res = model.generate(**inputs, **generate_kwargs)
sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True)
print(sentence)