🚀 Qwen2.5-VL-7B-Instruct-gptqmodel-int8
This project offers a GPTQ-INT8 quantized version of Qwen2.5-VL-7B-Instruct, achieved using the GPTQModel toolkit. It aims to optimize the model's performance through quantization.
🚀 Quick Start
✨ Features
- Quantization: Utilizes the GPTQ-INT8 quantization method to optimize the Qwen2.5-VL-7B-Instruct model.
- Toolkit: Employs the GPTQModel toolkit for quantization operations.
📦 Installation
First, make sure you have Python 3.10.x or a higher version installed. Then, use the following command to install the necessary dependencies:
pip3 install -v "gptqmodel>=2.2.0" --no-build-isolation
💻 Usage Examples
Basic Usage
To perform quantization, use the following command:
python3 gptqmodel_quantize.py /path/to/Qwen2.5-VL-7B-Instruct/ /path/to/Qwen2.5-VL-7B-Instruct-gptqmodel-int8 8
Here is the detailed code of gptqmodel_quantize.py
:
import fire
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
from gptqmodel.models.definitions.base_qwen2_vl import BaseQwen2VLGPTQ
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["PYTHONUTF8"]="1"
def format_qwen2_vl_dataset(image, assistant):
return [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "generate a caption for this image"},
],
},
{"role": "assistant", "content": assistant},
]
def prepare_dataset(format_func, n_sample: int = 20) -> list[list[dict]]:
from datasets import load_dataset
dataset = load_dataset(
"laion/220k-GPT4Vision-captions-from-LIVIS", split=f"train[:{n_sample}]"
)
return [
format_func(sample["url"], sample["caption"])
for sample in dataset
]
def get_calib_dataset(model):
if isinstance(model, BaseQwen2VLGPTQ):
return prepare_dataset(format_qwen2_vl_dataset, n_sample=256)
raise NotImplementedError(f"Unsupported MODEL: {model.__class__}")
def quantize(model_path: str,
output_path: str,
bit: int):
quant_config = QuantizeConfig(bits=bit, group_size=128)
model = GPTQModel.load(model_path, quant_config)
calibration_dataset = get_calib_dataset(model)
model.quantize(calibration_dataset, batch_size=8)
model.save(output_path)
model = GPTQModel.load(output_path)
result = model.generate("Uncovering deep insights begins with")[0]
print(model.tokenizer.decode(result))
if __name__ == "__main__":
fire.Fire(quantize)
📄 License
This project is licensed under the MIT license.
Property |
Details |
License |
MIT |
Base Model |
Qwen/Qwen2.5-VL-7B-Instruct |
Pipeline Tag |
image-text-to-text |
Library Name |
transformers |
Tags |
text-generation-inference |