Llama 2 7B Model 4-bit GPTQ Quantized Version - Open-source and Free, Focused on Python Code Generation

Llama 2 7b Int4 GPTQ Python Code 20k

Developed by edumunozsala

This is a 4-bit GPTQ quantized version of the Llama 2 7B model, specifically fine-tuned for Python code generation tasks

Large Language Model

Transformers

OtherOpen Source License:Gpl-3.0 #4-bit quantization #Python code generation #GPTQ optimization

Downloads 22

Release Time : 9/4/2023

Model Overview

This model is a 4-bit GPTQ quantized version based on the Llama 2 7B architecture, focusing on Python code generation tasks, using QLoRa for 4-bit quantization combined with the PEFT library and bitsandbytes

Model Features

4-bit GPTQ quantization

Uses GPTQ algorithm for 4-bit quantization, significantly reducing model size while maintaining performance

Python code optimization

Specifically fine-tuned for Python code generation tasks

Efficient inference

The quantized model can run efficiently even on consumer-grade GPUs

Model Capabilities

Python code generation

Code completion

Code explanation

Use Cases

Development assistance

Code autocompletion

Helps developers quickly generate Python code snippets

Code explanation

Provides explanations and documentation for existing code

Education

Programming learning aid

Provides example code and solutions for programming learners

🚀 Llama 2 7b 4-bit GPTQ Python Coder 👩‍💻

This model is the GPTQ Quantization of my Llama 2 7B 4-bit Python Coder. It offers an efficient way to handle Python coding tasks with reduced model size. The base model link is here

✨ Features

Quantization Parameters:
- 4-bit quantization
- Group size is 128
- Dataset C4
- Decreasing activation is False

📚 Documentation

Model Description

Llama 2 7B 4-bit Python Coder is a fine-tuned version of the Llama 2 7B model. It uses QLoRa in 4-bit with the PEFT library and bitsandbytes, enabling better performance in Python coding scenarios.

Quantization

A quick definition extracted from a great article in Medium by Benjamin Marie "GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2" (Only for Medium subscribers)

"GPTQ (Frantar et al., 2023) was first applied to models ready to deploy. In other words, once the model is fully fine-tuned, GPTQ will be applied to reduce its size. GPTQ can lower the weight precision to 4-bit or 3-bit. In practice, GPTQ is mainly used for 4-bit quantization. 3-bit has been shown very unstable (Dettmers and Zettlemoyer, 2023). It quantizes without loading the entire model into memory. Instead, GPTQ loads and quantizes the LLM module by module. Quantization also requires a small sample of data for calibration which can take more than one hour on a consumer GPU."

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "edumunozsala/llama-2-7b-int4-GPTQ-python-code-20k"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

instruction="Write a Python function to display the first and last elements of a list."
input=""

prompt = f"""### Instruction:
Use the Task below and the Input given to write the Response, which is a programming code that can solve the Task.

### Task:
{instruction}

### Input:
{input}

### Response:
"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=128, do_sample=True, top_p=0.9,temperature=0.3)

print(f"Prompt:\n{prompt}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")

Citation

@misc {edumunozsala_2023,
	author       = { {Eduardo Muñoz} },
	title        = { llama-2-7b-int4-GPTQ-python-coder },
	year         = 2023,
	url          = { https://huggingface.co/edumunozsala/llama-2-7b-int4-GPTQ-python-code-20k },
	publisher    = { Hugging Face }
}

📄 License

This project is licensed under the GPL-3.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご