🚀 llama-3.1-8B-vision-378
本項目是一個投影模塊,通過SigLIP訓練為Llama 3賦予視覺能力,隨後應用於Llama-3.1-8B-Instruct。由 @yeswondwerr 和 @qtnx_ 構建。
📄 許可證
本項目使用的許可證為 llama3.1。
📦 數據集
- liuhaotian/LLaVA-CC3M-Pretrain-595K
🚀 快速開始
本模型的pipeline標籤為 image-text-to-text。
💻 使用示例
基礎用法
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import requests
from io import BytesIO
url = "https://huggingface.co/qresearch/llama-3-vision-alpha-hf/resolve/main/assets/demo-2.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content))
model = AutoModelForCausalLM.from_pretrained(
"qresearch/llama-3.1-8B-vision-378",
trust_remote_code=True,
torch_dtype=torch.float16,
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("qresearch/llama-3.1-8B-vision-378", use_fast=True,)
print(
model.answer_question(
image, "Briefly describe the image", tokenizer, max_new_tokens=128, do_sample=True, temperature=0.3
),
)
高級用法
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig
import requests
from io import BytesIO
url = "https://huggingface.co/qresearch/llama-3-vision-alpha-hf/resolve/main/assets/demo-2.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content))
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
llm_int8_skip_modules=["mm_projector", "vision_model"],
)
model = AutoModelForCausalLM.from_pretrained(
"qresearch/llama-3.1-8B-vision-378",
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=bnb_cfg,
)
tokenizer = AutoTokenizer.from_pretrained(
"qresearch/llama-3.1-8B-vision-378",
use_fast=True,
)
print(
model.answer_question(
image, "Briefly describe the image", tokenizer, max_new_tokens=128, do_sample=True, temperature=0.3
),
)