🚀 Aria-sequential_mlp-bnb_nf4
Aria-sequential_mlp-bnb_nf4 是基於 Aria-sequential_mlp 進行 BitsAndBytes NF4 量化的模型。該模型需要約 15.5 GB 的顯存,可在 RTX 3090 上運行,也能在 16 GB 的 RTX 4060 Ti 上運行(不過不太實用,僅在不使用 device_map=auto
時可行)。
模型信息
屬性 |
詳情 |
庫名稱 |
transformers |
許可證 |
apache-2.0 |
基礎模型 |
rhymes-ai/Aria-sequential_mlp、rhymes-ai/Aria |
任務類型 |
圖像文本到文本 |
🚀 快速開始
📦 安裝指南
pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow bitsandbytes
pip install flash-attn --no-build-isolation
💻 使用示例
基礎用法
import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, BitsAndBytesConfig
torch.cuda.set_device(0)
model_id_or_path = "leon-se/Aria-sequential_mlp-bnb_nf4"
model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
image = Image.open(requests.get(image_path, stream=True).raw)
messages = [
{
"role": "user",
"content": [
{"text": None, "type": "image"},
{"text": "what is the image?", "type": "text"},
],
}
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.inference_mode(), torch.amp.autocast("cuda", dtype=torch.bfloat16):
output = model.generate(
**inputs,
max_new_tokens=500,
stop_strings=["<|im_end|>"],
tokenizer=processor.tokenizer,
do_sample=True,
temperature=0.9,
)
output_ids = output[0][inputs["input_ids"].shape[1]:]
result = processor.decode(output_ids, skip_special_tokens=True)
print(result)
print(f'Max allocated memory: {torch.cuda.max_memory_allocated(device="cuda") / 1024 ** 3:.3f}GiB')
高級用法
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "rhymes-ai/Aria-sequential_mlp"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
llm_int8_enable_fp32_cpu_offload=True,
llm_int8_skip_modules=["language_model.lm_head", "multi_modal_projector", "vision_tower"],
)
model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)
注意事項
⚠️ 重要提示
目前該模型未進行 5 GB 分片,因為在加載序列化的 BNB 模型時,分片似乎會 導致問題。這可能會使模型無法在免費版的 Colab 中加載。