Llama-2-7b-hf-4bit-64rank Open-source LoftQ Model - Free to Improve the Fine-tuning Performance and Efficiency of Large Language Models

Llama 2 7b Hf 4bit 64rank

Developed by LoftQ

The LoftQ (LoRA Fine-tuning Aware Quantization) model provides a quantized backbone network and LoRA adapters, specifically designed for LoRA fine-tuning to improve the fine-tuning performance and efficiency of large language models during the quantization process.

Large Language Model

Transformers

EnglishOpen Source License:MIT #LoRA Fine-tuning Aware Quantization #4-bit Efficient Inference #Mathematical Problem Solving

Downloads 1,754

Release Time : 11/21/2023

Model Overview

This model is based on LLAMA-2-7b and is 4-bit quantized using the LoftQ method. It also provides LoRA adapters to address the incompatibility issue between large language models and LoRA fine-tuning during the quantization process.

Model Features

Quantization Support

Provides a 4-bit quantized backbone network, significantly reducing the model storage and computational resource requirements.

LoRA Fine-tuning Awareness

A quantization method specifically designed for LoRA fine-tuning to optimize the performance and efficiency during the fine-tuning process.

Efficient Storage

The quantized model size is approximately 4.2 GiB, suitable for resource-constrained environments.

Model Capabilities

Text Generation

LoRA Fine-tuning

Use Cases

Mathematical Problem Solving

GSM8K Mathematical Problem Solving

After fine-tuning on the GSM8K dataset, the model can be used to solve mathematical problems.

The accuracy of the fine-tuned model on GSM8K is 35.0%.

Text Generation

WikiText-2 Text Generation

Fine-tuned on the WikiText-2 dataset for generating coherent text.

The perplexity of the fine-tuned model on WikiText-2 is 5.24.

🚀 LoftQ Initialization

LoftQ (LoRA-fine-tuning-aware Quantization) offers a quantized backbone Q along with LoRA adapters A and B, based on a full-precision pre-trained weight W. This project aims to provide an efficient quantization solution for large language models.

| Paper | Code | PEFT Example |

This model, Llama-2-7b-hf-4bit-64rank, is derived from LLAMA-2-7b. The backbone is stored under LoftQ/Llama-2-7b-hf-4bit-64rank, and the LoRA adapters are located in the subfolder='loftq_init'.

✨ Features

Provide quantized backbone and LoRA adapters based on full - precision pre - trained weights.
Support fine - tuning and inference of large language models.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Training Here's an example of loading this model and preparing for the LoRA fine - tuning.

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

MODEL_ID = "LoftQ/Llama-2-7b-hf-4bit-64rank"

base_model = AutoModelForCausalLM.from_pretrained(MODEL_ID)
peft_model = PeftModel.from_pretrained(
    base_model,
    MODEL_ID,
    subfolder="loftq_init",
    is_trainable=True,
)

# Do training with peft_model ...

Advanced Usage

Inference Here is an example code for inference after the model has been fine - tuned on GSM8K.

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

MODEL_ID = "LoftQ/Llama-2-7b-hf-4bit-64rank"

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    torch_dtype=torch.bfloat16,  # you may change it with different models
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,  # bfloat16 is recommended
        bnb_4bit_use_double_quant=False,
        bnb_4bit_quant_type='nf4',
    ),
)
peft_model = PeftModel.from_pretrained(
    base_model,
    MODEL_ID,
    subfolder="gsm8k",
    is_trainable=True,
)

# Do inference with peft_model ...

See the full code at our Github Repo

📚 Documentation

Model Info

Backbone

Property	Details
Stored Format	bitsandbytes nf4
Size	~ 4.2 GiB
Loaded Format	bitsandbytes nf4
Size Loaded on GPU	~ 4.2 GiB

LoRA adapters

Property	Details
Rank	64
lora_alpha	16
Target Modules	["down_proj", "up_proj", "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj"]

Experiment Results

We have conducted experiments on supervised fine - tuning of GSM8K and WikiText - 2.

Model	Bits	Rank	LoRA Initial	GSM8K	WikiText - 2
LLAMA - 2 - 7b	16	64	Gaussian + 0	36.9	5.08
LLAMA - 2 - 7b	4	64	Gaussian + 0 (QLoRA)	35.1	5.70
LLAMA - 2 - 7b	4	64	LoftQ	35.0	5.24

📄 License

This project is licensed under the MIT license.

📚 Citation

@article{li2023loftq,
  title={Loftq: Lora-fine-tuning-aware quantization for large language models},
  author={Li, Yixiao and Yu, Yifan and Liang, Chen and He, Pengcheng and Karampatziakis, Nikos and Chen, Weizhu and Zhao, Tuo},
  journal={arXiv preprint arXiv:2310.08659},
  year={2023}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご