InternLM-XComposer2-4KHD-7B開源視覺語言大模型

首頁

Internlm Xcomposer2 4khd 7b

由internlm開發

InternLM-XComposer2-4KHD是基於InternLM2的通用視覺語言大模型，具備4K分辨率圖像理解能力。

文本生成圖像

Transformers

開源協議:其他 #4K圖像理解 #多輪視覺對話 #高分辨率視覺問答

下載量 1,180

發布時間 : 4/7/2024

模型概述

InternLM-XComposer2-4KHD是一個通用視覺語言大模型(VLLM)，能夠處理高分辨率圖像(4K)並理解圖像內容，支持視覺問答等任務。

模型特點

4K分辨率圖像理解

支持高達4K分辨率的高清圖像內容理解與分析

多輪視覺對話

支持基於圖像的多輪對話，能夠記住上下文進行連貫交流

高精度圖像描述

能夠生成詳細準確的圖像描述，捕捉圖像中的細節內容

模型能力

高分辨率圖像理解

視覺問答

圖像內容描述

多輪視覺對話

使用案例

圖像分析

信息圖解讀

分析複雜信息圖中的內容和趨勢

能夠準確識別信息圖中的各個部分並詳細描述內容

視覺輔助

圖像內容描述

為視障用戶提供圖像內容的詳細描述

生成準確、詳細的圖像描述

🚀 InternLM-XComposer2-4KHD

InternLM-XComposer2-4KHD是基於InternLM2的通用視覺語言大模型（VLLM），具備4K分辨率圖像理解能力。

InternLM-XComposer2-4KHD

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) [Paper](https://arxiv.org/abs/2401.16420)

🚀 快速開始

我們提供一個簡單的示例，展示如何使用🤗 Transformers調用InternLM-XComposer。

基礎用法

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-4khd-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-4khd-7b', trust_remote_code=True)

###############
# First Round
###############

query1 = '<ImageHere>Illustrate the fine details present in the image'
image = './example.webp'
with torch.cuda.amp.autocast():
  response, his = model.chat(tokenizer, query=query, image=image, hd_num=55, history=[], do_sample=False, num_beams=3)
print(response)
# The image is a vibrant and colorful infographic that showcases 7 graphic design trends that will dominate in 2021. The infographic is divided into 7 sections, each representing a different trend. 
# Starting from the top, the first section focuses on "Muted Color Palettes", highlighting the use of muted colors in design.
# The second section delves into "Simple Data Visualizations", emphasizing the importance of easy-to-understand data visualizations. 
# The third section introduces "Geometric Shapes Everywhere", showcasing the use of geometric shapes in design. 
# The fourth section discusses "Flat Icons and Illustrations", explaining how flat icons and illustrations are being used in design. 
# The fifth section is dedicated to "Classic Serif Fonts", illustrating the resurgence of classic serif fonts in design.
# The sixth section explores "Social Media Slide Decks", illustrating how slide decks are being used on social media. 
# Finally, the seventh section focuses on "Text Heavy Videos", illustrating the trend of using text-heavy videos in design. 
# Each section is filled with relevant images and text, providing a comprehensive overview of the 7 graphic design trends that will dominate in 2021.

###############
# Second Round
###############
query1 = 'what is the detailed explanation of the third part.'
with torch.cuda.amp.autocast():
  response, _ = model.chat(tokenizer, query=query1, image=image, hd_num=55, history=his, do_sample=False, num_beams=3)
print(response)
# The third part of the infographic is about "Geometric Shapes Everywhere". It explains that last year, designers used a lot of
# flowing and abstract shapes in their designs. However, this year, they have been replaced with rigid, hard-edged geometric
# shapes and patterns. The hard edges of a geometric shape create a great contrast against muted colors.

📦 安裝指南

從Transformers導入模型

要使用Transformers加載InternLM-XComposer2-4KHD模型，請使用以下代碼：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2-4khd-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# Set `torch_dtype=torch.floatb16` to load model in bfloat16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
model = model.eval()