acip_llama1_7b Open-Source Model - A Compressed Version of Llama-7B with Dynamically Adjustable Compression Ratio

Home

Acip Llama1 7b

Developed by MerantixMomentum

Compressible version of Llama-7B model provided by the ACIP project, supports dynamic adjustment of compression ratio

Large Language Model

Transformers

EnglishOpen Source License:Other #Dynamic Compression #Reversible Pruning #On-demand Quantization

Downloads 83

Release Time : 4/15/2025

Model Overview

Compressible model based on jeffwan/llama-7b-hf, enabling flexible parameter adjustment through ACIP technology while maintaining performance at different compression rates

Model Features

Dynamic Compression

Supports real-time adjustment of model compression ratio via size_ratio parameter (range 0.0-1.0)

Reversible Compression

Compression operations are reversible, allowing multiple compression rate adjustments for performance evaluation

Quantization Support

Supports 4-bit quantization via bitsandbytes for further memory savings

Model Capabilities

Text Generation

Model Compression

Quantized Inference

Use Cases

Resource Optimization

Edge Device Deployment

Deploy large models on resource-constrained devices through compression and quantization

Significant reduction in memory usage

Multi-compression Rate Evaluation

Rapidly test model performance under different compression rates

Obtain compression performance curves without retraining

🚀 ACIP applied to jeffwan/llama-7b-hf

This model repository, part of the ACIP Project, offers a compressible version of the jeffwan/llama-7b-hf model. For more details, visit our code repo.

[ 🤖 GitHub | 📄 Paper | 🌐 Website ]

🚀 Quick Start

Load the ACIP model using from_pretrained:

from transformers import AutoModel

model = AutoModel.from_pretrained("MerantixMomentum/acip_llama1_7b", trust_remote_code=True)

This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you want. For example:

model.prune_model_by_score(size_ratio=0.4)

This will prune the model to 40% of its original size in terms of the number of parameters, which means a 60% compression rate. A unique feature of ACIP is that this operation is reversible. You can rerun model.prune_model_by_score as many times as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run:

model.compress()

This will discard all pruned mask values of compressible linear layers. Now the model is actually compressed, and you should notice a significant decrease in memory usage (this step is not reversible without reloading the ACIP model). If you want, you can also run:

model.quantize()

to save even more memory (we have only tested 4-bit quantization with bitsandbytes, but you can customize this).

🚀 That's it! You can now use your compressed model for inference or fine-tuning like any other Causal Language Model from 🤗 transformers.

⚠️ Important Note

The parameter size_ratio ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters, and 1.0 means no compression at all. Alternatively, you can also set compression_rate in prune_model_by_score, which is equivalent to size_ratio = 1.0 - compression_rate.

📦 Installation

To run an ACIP model from our hub, you only need minimal dependencies, namely torch, transformers, peft, and optionally, bitsandbytes if you want to quantize your model. See requirements.txt for pip-installable dependencies with exact version pins (newer versions should work as well).

📄 License

The license is inherited from the base model jeffwan/llama-7b-hf.

📚 Documentation

Citation

When using or referring to this model, please cite our paper:

@article{mxm2025acip,
  title={Choose Your Model Size: Any Compression by a Single Gradient Descent}, 
  author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
  year={2025},
  journal={Preprint arXiv:2502.01717}
}

Information Table

Property	Details
Model Type	ACIP applied to jeffwan/llama-7b-hf
Training Data	['allenai/c4']
Metrics	['perplexity', 'accuracy']
Tags	['acip', 'pytorch']
Base Model	jeffwan/llama-7b-hf
Pipeline Tag	text-generation
Library Name	transformers
License	other

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご