🚀 llama-3.1-8B-vision-378
本项目是一个投影模块,通过SigLIP训练为Llama 3赋予视觉能力,随后应用于Llama-3.1-8B-Instruct。由 @yeswondwerr 和 @qtnx_ 构建。
📄 许可证
本项目使用的许可证为 llama3.1。
📦 数据集
- liuhaotian/LLaVA-CC3M-Pretrain-595K
🚀 快速开始
本模型的pipeline标签为 image-text-to-text。
💻 使用示例
基础用法
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import requests
from io import BytesIO
url = "https://huggingface.co/qresearch/llama-3-vision-alpha-hf/resolve/main/assets/demo-2.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content))
model = AutoModelForCausalLM.from_pretrained(
"qresearch/llama-3.1-8B-vision-378",
trust_remote_code=True,
torch_dtype=torch.float16,
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("qresearch/llama-3.1-8B-vision-378", use_fast=True,)
print(
model.answer_question(
image, "Briefly describe the image", tokenizer, max_new_tokens=128, do_sample=True, temperature=0.3
),
)
高级用法
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig
import requests
from io import BytesIO
url = "https://huggingface.co/qresearch/llama-3-vision-alpha-hf/resolve/main/assets/demo-2.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content))
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
llm_int8_skip_modules=["mm_projector", "vision_model"],
)
model = AutoModelForCausalLM.from_pretrained(
"qresearch/llama-3.1-8B-vision-378",
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=bnb_cfg,
)
tokenizer = AutoTokenizer.from_pretrained(
"qresearch/llama-3.1-8B-vision-378",
use_fast=True,
)
print(
model.answer_question(
image, "Briefly describe the image", tokenizer, max_new_tokens=128, do_sample=True, temperature=0.3
),
)