InternLM-XComposer2-VL-7B-4Bit开源视觉语言模型

首页

Internlm Xcomposer2 Vl 7b 4bit

由 internlm 开发

基于InternLM2的视觉语言大模型，具备卓越的图文理解与创作能力

图像生成文本

Transformers

开源协议:其他 #图文交错创作 #多模态理解 #视觉语言模型

下载量 1,635

发布时间 : 2/6/2024

模型简介

书生·浦语XComposer2-VL是以InternLM2作为大语言模型基座的预训练视觉语言模型，在多模态基准测试中表现优异

模型特点

多模态理解与创作

具备卓越的图文理解与创作能力，支持自由图文交错创作

量化版本

提供4-bit量化版本，降低计算资源需求

高性能表现

在多模态基准测试中表现优异

模型能力

图文理解

图文创作

多模态交互

文本生成

使用案例

内容创作

图片描述生成

根据输入的图片生成详细描述

生成准确、详细的图片描述

图文交错创作

支持自由图文交错的内容创作

创作图文并茂的内容

视觉问答

图片内容问答

回答关于图片内容的各类问题

准确理解图片内容并回答问题

🚀 InternLM-XComposer2

InternLM-XComposer2 是基于 InternLM2 的视觉语言大模型（VLLM），可实现高级的文本 - 图像理解与合成。

InternLM-XComposer2

[💻Github 仓库](https://github.com/InternLM/InternLM-XComposer) [论文](https://arxiv.org/abs/2401.16420)

✨ 主要特性

我们发布了两个版本的 InternLM-XComposer2 系列：

InternLM-XComposer2-VL：以 InternLM2 作为大语言模型（LLM）初始化的预训练视觉语言大模型，在各种多模态基准测试中表现出色。
InternLM-XComposer2：针对“自由形式的文本 - 图像交错合成”进行微调的视觉语言大模型。

这是 InternLM-XComposer2-VL 的 4 位版本，使用前请安装最新版本的 auto_gptq。

🚀 快速开始

我们提供了一个简单的示例，展示如何使用 🤗 Transformers 来使用 InternLM-XComposer。

import torch, auto_gptq
from transformers import AutoModel, AutoTokenizer 
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"]
torch.set_grad_enabled(False)

class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "model.layers"
    outside_layer_modules = [
        'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', 
    ]
    inside_layer_modules = [
        ["attention.wqkv.linear"],
        ["attention.wo.linear"],
        ["feed_forward.w1.linear", "feed_forward.w3.linear"],
        ["feed_forward.w2.linear"],
    ]
 
# init model and tokenizer
model = InternLMXComposer2QForCausalLM.from_quantized(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval()
tokenizer = AutoTokenizer.from_pretrained(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True)

text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast(): 
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) 
print(response)
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." 
#The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. 
#They appear to be hiking or climbing, as one of them is holding a walking stick. 
#The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures.