🚀 CogVLM
CogVLM 是一個強大的 開源視覺語言模型(VLM)。CogVLM-17B 擁有 100 億視覺參數和 70 億語言參數。CogVLM-17B 在 10 個經典跨模態基準測試中取得了最先進的性能,包括 NoCaps、Flicker30k 圖像描述、RefCOCO、RefCOCO+、RefCOCOg、Visual7W、GQA、ScienceQA、VizWiz VQA 和 TDIUC,在 VQAv2、OKVQA、TextVQA、COCO 圖像描述等任務中排名第二,超越或媲美 PaLI-X 55B。本項目是使用 bitsandbytes 進行 4 位精度量化的 CogVLM 定位通用模型。
🚀 快速開始
環境安裝
pip install torch==2.2.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 transformers==4.38.1 accelerate==0.27.2 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.24 protobuf==3.20.3 triton==2.1.0 bitsandbytes==0.43.0.dev0
在 Windows 系統上使用 triton 和 bitsandbytes,請使用以下文件進行安裝:
pip install bitsandbytes-0.43.0.dev0-cp310-cp310-win_amd64.whl
pip install triton-2.1.0-cp310-cp310-win_amd64.whl
代碼示例
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, LlamaTokenizer
model_path = "'local/model/folder/path/here' or 'Rodeszones/CogVLM-grounding-generalist-hf-quant4'"
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).eval()
query = 'Can you provide a description of the image and include the coordinates [[x0,y0,x1,y1]] for each mentioned object?'
image = Image.open("your/image/path/here").convert('RGB')
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image])
inputs = {
'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]],
}
gen_kwargs = {"max_length": 2048, "do_sample": False}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0]))
📄 許可證
本倉庫中的代碼遵循 Apache-2.0 許可證 開源,而 CogVLM 模型權重的使用必須遵守 模型許可證。
📚 引用
@article{wang2023cogvlm,
title={CogVLM: Visual Expert for Pretrained Language Models},
author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
year={2023},
eprint={2311.03079},
archivePrefix={arXiv},
primaryClass={cs.CV}
}