MiniCPM-V 2.6開源多模態視覺語言模型 - 支持圖文轉文本與多語言處理

首頁

Minicpm V 2 6 Int4

由openbmb開發

MiniCPM-V 2.6是一個多模態視覺語言模型，支持圖像文本到文本的轉換，具備多語言處理能力。

圖像生成文本

Transformers

其他#多模態直播 #即時語音對話 #多語言支持

下載量 122.58k

發布時間 : 8/4/2024

模型概述

MiniCPM-V 2.6是一個基於MiniCPM-V架構的多模態模型，專注於視覺語言任務，能夠處理圖像、文本、視頻等多種輸入，並生成相應的文本輸出。

模型特點

多模態支持

支持圖像、文本、視頻等多種輸入模態，能夠處理複雜的多模態任務。

多語言處理

支持多種語言，具備跨語言處理能力。

高性能

相比前代模型有顯著性能提升，支持即時處理。

模型能力

圖像文本轉換

多語言文本生成

視頻內容分析

光學字符識別

多圖像處理

使用案例

內容生成

圖像描述生成

根據輸入的圖像生成詳細的文本描述。

生成準確且詳細的圖像描述文本。

視頻內容摘要

分析視頻內容並生成簡潔的文本摘要。

生成視頻內容的文本摘要，便於快速理解。

文檔處理

光學字符識別

從圖像或視頻中提取文字信息。

高精度的文字識別和提取。

🚀 MiniCPM-V 2.6 int4

MiniCPM-V 2.6 int4 是 MiniCPM-V 2.6 的 int4 量化版本，使用該版本進行推理時，GPU 內存佔用更低（約 7GB）。

🚀 快速開始

環境依賴

在 NVIDIA GPU 上使用 Huggingface transformers 進行推理。測試環境為 Python 3.10，所需依賴如下：

Pillow==10.1.0
torch==2.1.2
torchvision==0.16.2
transformers==4.40.0
sentencepiece==0.1.99
accelerate==0.30.1
bitsandbytes==0.43.1

代碼示例

# test.py
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]

res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(res)

## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7,
    stream=True
)

generated_text = ""
for new_text in res:
    generated_text += new_text
    print(new_text, flush=True, end='')

✨ 主要特性

多語言支持：支持多種語言的處理。
多模態能力：支持圖像、文本輸入，可處理圖像識別、OCR 等任務。
低內存佔用：int4 量化版本顯著降低 GPU 內存使用。

📦 安裝指南

按照上述依賴列表，使用以下命令安裝所需庫：

pip install Pillow==10.1.0 torch==2.1.2 torchvision==0.16.2 transformers==4.40.0 sentencepiece==0.1.99 accelerate==0.30.1 bitsandbytes==0.43.1

💻 使用示例

基礎用法

# test.py
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]

res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(res)

高級用法

## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7,
    stream=True
)

generated_text = ""
for new_text in res:
    generated_text += new_text
    print(new_text, flush=True, end='')

📚 詳細文檔

模型信息

屬性	詳情
模型類型	圖像文本到文本模型
訓練數據	openbmb/RLAIF-V-Dataset
基礎模型	openbmb/MiniCPM-V-2_6