🚀 mPLUG - Owl
mPLUG - Owl是一個圖像到文本的模型,可用於根據圖像生成文本描述等任務,在圖像理解和文本生成方面具有強大的能力。
🚀 快速開始
📦 安裝指南
從Github獲取最新的代碼庫:
git clone https://github.com/X-PLUG/mPLUG-Owl.git
💻 使用示例
基礎用法
模型初始化
from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor
pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
pretrained_ckpt,
torch_dtype=torch.bfloat16,
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)
模型推理
準備模型輸入:
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: Explain why this meme is funny.
AI: ''']
image_list = ['https://xxx.com/image.jpg']
獲取響應:
generate_kwargs = {
'do_sample': True,
'top_k': 5,
'max_length': 512
}
from PIL import Image
images = [Image.open(_) for _ in image_list]
inputs = processor(text=prompts, images=images, return_tensors='pt')
inputs = {k: v.bfloat16() if v.dtype == torch.float else v for k, v in inputs.items()}
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
res = model.generate(**inputs, **generate_kwargs)
sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True)
print(sentence)
📄 許可證
本項目採用Apache - 2.0許可證。