InternLM-XComposer2-VL-7B-4Bit開源視覺語言模型

首頁

Internlm Xcomposer2 Vl 7b 4bit

由internlm開發

基於InternLM2的視覺語言大模型，具備卓越的圖文理解與創作能力

圖像生成文本

Transformers

開源協議:其他 #圖文交錯創作 #多模態理解 #視覺語言模型

下載量 1,635

發布時間 : 2/6/2024

模型概述

書生·浦語XComposer2-VL是以InternLM2作為大語言模型基座的預訓練視覺語言模型，在多模態基準測試中表現優異

模型特點

多模態理解與創作

具備卓越的圖文理解與創作能力，支持自由圖文交錯創作

量化版本

提供4-bit量化版本，降低計算資源需求

高性能表現

在多模態基準測試中表現優異

模型能力

圖文理解

圖文創作

多模態交互

文本生成

使用案例

內容創作

圖片描述生成

根據輸入的圖片生成詳細描述

生成準確、詳細的圖片描述

圖文交錯創作

支持自由圖文交錯的內容創作

創作圖文並茂的內容

視覺問答

圖片內容問答

回答關於圖片內容的各類問題

準確理解圖片內容並回答問題

🚀 InternLM-XComposer2

InternLM-XComposer2 是基於 InternLM2 的視覺語言大模型（VLLM），可實現高級的文本 - 圖像理解與合成。

InternLM-XComposer2

[💻Github 倉庫](https://github.com/InternLM/InternLM-XComposer) [論文](https://arxiv.org/abs/2401.16420)

✨ 主要特性

我們發佈了兩個版本的 InternLM-XComposer2 系列：

InternLM-XComposer2-VL：以 InternLM2 作為大語言模型（LLM）初始化的預訓練視覺語言大模型，在各種多模態基準測試中表現出色。
InternLM-XComposer2：針對“自由形式的文本 - 圖像交錯合成”進行微調的視覺語言大模型。

這是 InternLM-XComposer2-VL 的 4 位版本，使用前請安裝最新版本的 auto_gptq。

🚀 快速開始

我們提供了一個簡單的示例，展示如何使用 🤗 Transformers 來使用 InternLM-XComposer。

import torch, auto_gptq
from transformers import AutoModel, AutoTokenizer 
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"]
torch.set_grad_enabled(False)

class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "model.layers"
    outside_layer_modules = [
        'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', 
    ]
    inside_layer_modules = [
        ["attention.wqkv.linear"],
        ["attention.wo.linear"],
        ["feed_forward.w1.linear", "feed_forward.w3.linear"],
        ["feed_forward.w2.linear"],
    ]
 
# init model and tokenizer
model = InternLMXComposer2QForCausalLM.from_quantized(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval()
tokenizer = AutoTokenizer.from_pretrained(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True)

text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast(): 
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) 
print(response)
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." 
#The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. 
#They appear to be hiking or climbing, as one of them is holding a walking stick. 
#The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures.