🚀 易视觉大模型(Yi-VL-6B)Huggingface版本
这是易视觉大模型(Yi-VL-6B)的Huggingface版本。你可以使用此模型进行下游任务的微调,我们推荐使用我们高效的微调工具包:https://github.com/hiyouga/LLaMA-Factory 。
✨ 主要特性
📦 安装指南
文档中未提及具体安装步骤,可参考相关依赖库的官方文档进行安装,如transformers
、torch
、Pillow
、requests
等。
💻 使用示例
基础用法
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq, LlavaConfig
import transformers
from torch import nn
class LlavaMultiModalProjectorYiVL(nn.Module):
def __init__(self, config: "LlavaConfig"):
super().__init__()
self.linear_1 = nn.Linear(config.vision_config.hidden_size, config.text_config.hidden_size, bias=True)
self.linear_2 = nn.LayerNorm(config.text_config.hidden_size, bias=True)
self.linear_3 = nn.Linear(config.text_config.hidden_size, config.text_config.hidden_size, bias=True)
self.linear_4 = nn.LayerNorm(config.text_config.hidden_size, bias=True)
self.act = nn.GELU()
def forward(self, image_features):
hidden_states = self.linear_1(image_features)
hidden_states = self.linear_2(hidden_states)
hidden_states = self.act(hidden_states)
hidden_states = self.linear_3(hidden_states)
hidden_states = self.linear_4(hidden_states)
return hidden_states
transformers.models.llava.modeling_llava.LlavaMultiModalProjector = LlavaMultiModalProjectorYiVL
model_id = "BUAADreamer/Yi-VL-6B-hf"
messages = [
{ "role": "user", "content": "<image>What's in the picture?" }
]
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
).to(0)
processor = AutoProcessor.from_pretrained(model_id)
text = [processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)]
images = [Image.open(requests.get(image_file, stream=True).raw)]
inputs = processor(text=text, images=images, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=200)
output = processor.batch_decode(output, skip_special_tokens=True)
print(output.split("Assistant:")[-1].strip())
高级用法
你也可以使用 LLaMA-Factory 中的CLI命令启动一个Web演示:
llamafactory-cli webchat \
--model_name_or_path BUAADreamer/Yi-VL-6B-hf \
--template yivl \
--visual_inputs
📚 详细文档
指标 |
值 |
MMMU_val |
36.8 |
CMMMU_val |
32.2 |
📄 许可证
本项目使用易系列模型许可证。