DolphinVision 72b開源多模態模型 - 對流行模型拒收圖像推理評論

首頁

Dolphin Vision 72b

由dphn開發

DolphinVision 72b是一個多模態模型，能夠對其他流行模型會拒絕處理的圖像進行推理和評論。

圖像生成文本

Transformers

開源協議:其他 #多模態推理 #無審查圖像分析 #高精度OCR

下載量 296

發布時間 : 6/28/2024

模型概述

DolphinVision 72b是一個未經審查的多模態模型，由Quan Nguyen、Eric Hartford和Cognitive Computations精心策劃並訓練而成，能夠處理圖像和文本的多模態任務。

模型特點

多模態能力

能夠同時處理圖像和文本，進行復雜的多模態推理。

未經審查

能夠對其他流行模型會拒絕處理的圖像進行推理和評論。

高性能

在多個基準測試中表現優異，接近或超過GPT-4V的性能。

模型能力

圖像描述

文本生成

OCR識別

多模態推理

使用案例

圖像分析

OCR文本識別

從圖像中提取並識別文本內容。

準確識別圖像中的諷刺性文本內容。

內容生成

圖像描述生成

根據圖像生成詳細的描述性文本。

生成準確且詳細的圖像描述。

🚀 DolphinVision 72b 🐬

DolphinVision 72b是一個多模態模型，由Quan Nguyen（qnguyen3/stablequan）、Eric Hartford和Cognitive Computations精心策劃並訓練而成。該模型未經審查，能夠對其他流行模型會拒絕處理的圖像進行推理和評論。

Discord鏈接：https://discord.gg/h3K4XGj2RH

模型圖片

特別感謝以下贊助商

Crusoe Cloud：提供了出色的按需8xH100節點用於訓練。
TensorWave：提供了8x mi300x節點用於評估和推理。

🚀 快速開始

運行示例代碼

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings

# 禁用一些警告
transformers.logging.set_verbosity_error()
transformers.logging.disable_progress_bar()
warnings.filterwarnings('ignore')

# 設置設備
torch.set_default_device('cuda')  # 也可以設置為 'cpu'

model_name = 'cognitivecomputations/dolphin-vision-72b'

# 創建模型
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map='auto',
    trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True)

# 文本提示
prompt = 'Describe this image in detail'

messages = [
    {"role": "user", "content": f'<image>\n{prompt}'}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print(text)

text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)

# 圖像，示例圖像可以在images文件夾中找到
image = Image.open('/path/to/image.png')
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)

# 生成結果
output_ids = model.generate(
    input_ids,
    images=image_tensor,
    max_new_tokens=2048,
    use_cache=True)[0]

print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

💻 使用示例

示例一

示例圖片

<|im_start|>user
: do OCR on all the text
<|im_start|>assistant
: 圖像中的文本如下：

- STOP USING DOCKER
- 應用程序不應像數字螢火蟲一樣被困在虛擬容器中
- 經過多年的軟件開發，卻沒有人質疑為什麼我們要把程序放在虛擬的“船”裡
- 讓我們對單體應用進行容器化。我要快速啟動17個微服務。這是完全瘋狂的人才能想出的話
- 看看他們一直以來用來欺騙我們的巨大謊言
- 這些是用於解釋Docker的真實圖片
- “嘿，夥計，我想在虛擬海洋中運行我的網站”
- 他們把我們當傻子耍了

這段文本是對軟件開發中使用Docker的諷刺評論，通過幽默和誇張的手法質疑了這種做法。

📊 基準測試

模型	VQA v2	MMBench	MMMU (Eval)	MathVista	RealWorldQA
Dolphin-Vision-72B	83.6	81.2	45.7	47.25	66.4
GPT-4V	84.4	78.1	52.4	-	67.9

📄 許可證

許可證類型：other
許可證名稱：tongyi-qianwen

📚 詳細文檔

模型信息

屬性	詳情
基礎模型	Qwen/Qwen2-72B
模型標籤	generated_from_trainer、axolotl
訓練數據集	cognitivecomputations/Dolphin-2.9、teknium/OpenHermes-2.5、m-a-p/CodeFeedback-Filtered-Instruction、cognitivecomputations/dolphin-coder、cognitivecomputations/samantha-data、microsoft/orca-math-word-problems-200k、Locutusque/function-calling-chatml、internlm/Agent-FLAN