模型简介

基于GLM-4V-9B的量化版本，专注于文档、图像和图表问答任务，支持12种语言交互，在多项基准测试中表现优异

模型特点

高效量化

4位量化版本内存占用小于9GB，可在Google Colab免费版运行

多语言支持

支持12种语言的交互，最佳性能为英文和中文

卓越性能

在文档、图像问答任务中超越GPT-4-turbo、Gemini 1.0 Pro等主流模型

长上下文支持

支持8K tokens的上下文长度

模型能力

文档理解

图像分析

图表解析

多语言文本生成

视觉问答

多模态推理

使用案例

教育

教材内容解析

解析教材中的图文内容并回答相关问题

准确理解教材中的图表和文字内容

商业

商业报告分析

自动提取和分析商业报告中的关键数据和图表

快速生成报告摘要和关键指标

🚀 多模态多语言模型 (3ML)

本模型是 glm-4v-9b 模型的 4 位量化版本（小于 9G）。它在文档、图像和图表问答方面表现出色，性能优于 GPT-4-turbo-2024-04-09、Gemini 1.0 Pro、Qwen-VL-Max 和 Claude 3 Opus。

原模型的部分内容经过修改，可在 Google Colab 免费版上运行。

立即试用：

![Github 源码]

⚠️ 重要提示

为了在文档和图像理解方面获得最佳性能，请使用英语或中文。不过，该模型仍然可以处理任何支持语言的对话。

关于 GLM-4V-9B

GLM-4V-9B 是一个具备视觉理解能力的多模态语言模型。其相关经典任务的评估结果如下：

	MMBench-EN-Test	MMBench-CN-Test	SEEDBench_IMG	MMStar	MMMU	MME	HallusionBench	AI2D	OCRBench
	英文综合	中文综合	综合能力	综合能力	学科综合	感知推理	幻觉性	图表理解	文字识别
GPT-4o, 20240513	83.4	82.1	77.1	63.9	69.2	2310.3	55	84.6	736
GPT-4v, 20240409	81	80.2	73	56	61.7	2070.2	43.9	78.6	656
GPT-4v, 20231106	77	74.4	72.3	49.7	53.8	1771.5	46.5	75.9	516
InternVL-Chat-V1.5	82.3	80.7	75.2	57.1	46.8	2189.6	47.4	80.6	720
LlaVA-Next-Yi-34B	81.1	79	75.7	51.6	48.8	2050.2	34.8	78.9	574
Step-1V	80.7	79.9	70.3	50	49.9	2206.4	48.4	79.2	625
MiniCPM-Llama3-V2.5	77.6	73.8	72.3	51.8	45.8	2024.6	42.4	78.4	725
Qwen-VL-Max	77.6	75.7	72.7	49.5	52	2281.7	41.2	75.7	684
GeminiProVision	73.6	74.3	70.7	38.6	49	2148.9	45.7	72.9	680
Claude-3V Opus	63.3	59.2	64	45.7	54.9	1586.8	37.8	70.6	694
GLM-4v-9B	81.1	79.4	76.8	58.7	47.2	2163.8	46.6	81.1	786

本仓库是 GLM-4V-9B 模型 4 位量化版本的模型仓库，支持 8K 上下文长度。

🚀 快速开始

可以使用 Colab 模型或运行以下 Python 脚本：

💻 使用示例

基础用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

device = "cuda"

modelPath="nikravan/glm-4vq"
tokenizer = AutoTokenizer.from_pretrained(modelPath, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    modelPath,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map="auto"
)

query ='explain all the details in this picture'
image = Image.open("a3.png").convert('RGB')
#image=""
inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
                                       add_generation_prompt=True, tokenize=True, return_tensors="pt",
                                       return_dict=True)  # chat with image mode

inputs = inputs.to(device)

gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
    outputs = model.generate(**inputs, **gen_kwargs)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0]))