Aria-sequential_mlp-bnb_nf4 Open-source Model - Effortlessly Solve Image-Text-to-Text Tasks with Low VRAM Requirements

Aria Sequential Mlp Bnb Nf4

Developed by leon-se

A BitsAndBytes NF4 quantized version based on Aria-sequential_mlp, suitable for image-to-text tasks with approximately 15.5 GB VRAM requirement.

Image-to-Text

Transformers

Open Source License:Apache-2.0 #Image-to-Text Generation #4-bit Quantized Inference #Multimodal Dialogue

Downloads 76

Release Time : 10/23/2024

Model Overview

This is a quantized multimodal model capable of processing both image and text inputs to generate text outputs. Specially optimized for VRAM usage, ideal for high-end GPUs like RTX 3090.

Model Features

NF4 Quantization

Utilizes BitsAndBytes NF4 quantization technology to significantly reduce VRAM requirements

Multimodal Processing

Capable of processing both image and text inputs to generate coherent text outputs

Efficient Inference

Optimized quantized model improves inference efficiency while maintaining performance

Model Capabilities

Image Understanding

Text Generation

Multimodal Dialogue

Use Cases

Image Captioning

Image Content Analysis

Analyze image content and generate descriptive text

Accurately identifies common objects and generates reasonable descriptions

Visual Question Answering

Image-based Q&A

Answer natural language questions about image content

Understands questions and provides relevant answers

🚀 Aria-sequential_mlp-bnb_nf4

This project provides a BitsAndBytes NF4 quantization model based on Aria-sequential_mlp, which can be used for image - text - to - text tasks.

🚀 Quick Start

The Aria-sequential_mlp-bnb_nf4 model is a BitsAndBytes NF4 quantization version of Aria-sequential_mlp. It requires about 15.5 GB of VRAM and can run on an RTX 3090. It can also run on an RTX 4060 Ti 16 GB, but it's not really practical without device_map=auto. Currently, the model is not 5 GB sharded because it seems to cause problems when loading serialized BNB models. This might make it impossible to load the model in free - tier Colab.

📦 Installation

pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow bitsandbytes
pip install flash-attn --no-build-isolation

💻 Usage Examples

Basic Usage

import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, BitsAndBytesConfig
torch.cuda.set_device(0)

model_id_or_path = "leon-se/Aria-sequential_mlp-bnb_nf4"

model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)

image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"

image = Image.open(requests.get(image_path, stream=True).raw)

messages = [
    {
        "role": "user",
        "content": [
            {"text": None, "type": "image"},
            {"text": "what is the image?", "type": "text"},
        ],
    }
]

text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.inference_mode(), torch.amp.autocast("cuda", dtype=torch.bfloat16):
    output = model.generate(
        **inputs,
        max_new_tokens=500,
        stop_strings=["<|im_end|>"],
        tokenizer=processor.tokenizer,
        do_sample=True,
        temperature=0.9,
    )
    output_ids = output[0][inputs["input_ids"].shape[1]:]
    result = processor.decode(output_ids, skip_special_tokens=True)

print(result)
print(f'Max allocated memory: {torch.cuda.max_memory_allocated(device="cuda") / 1024 ** 3:.3f}GiB')

Advanced Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "rhymes-ai/Aria-sequential_mlp"

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True,
    llm_int8_skip_modules=["language_model.lm_head", "multi_modal_projector", "vision_tower"],
    )

model_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Documentation

Model Information

Property	Details
Library Name	transformers
Base Model	rhymes - ai/Aria - sequential_mlp, rhymes - ai/Aria
Pipeline Tag	image - text - to - text
License	apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご