đ ACIP applied to jeffwan/llama-7b-hf
This model repository, part of the ACIP Project, offers a compressible version of the jeffwan/llama-7b-hf
model. For more details, visit our code repo.
[
đ¤ GitHub |
đ Paper |
đ Website
]
đ Quick Start
Load the ACIP model using from_pretrained
:
from transformers import AutoModel
model = AutoModel.from_pretrained("MerantixMomentum/acip_llama1_7b", trust_remote_code=True)
This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you want.
For example:
model.prune_model_by_score(size_ratio=0.4)
This will prune the model
to 40% of its original size in terms of the number of parameters, which means a 60% compression rate.
A unique feature of ACIP is that this operation is reversible. You can rerun model.prune_model_by_score
as many times as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run:
model.compress()
This will discard all pruned mask values of compressible linear layers.
Now the model is actually compressed, and you should notice a significant decrease in memory usage (this step is not reversible without reloading the ACIP model).
If you want, you can also run:
model.quantize()
to save even more memory (we have only tested 4-bit quantization with bitsandbytes
, but you can customize this).
đ That's it! You can now use your compressed model for inference or fine-tuning like any other Causal Language Model from đ¤ transformers.
â ī¸ Important Note
The parameter size_ratio
ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters, and 1.0 means no compression at all. Alternatively, you can also set compression_rate
in prune_model_by_score
, which is equivalent to size_ratio = 1.0 - compression_rate
.
đĻ Installation
To run an ACIP model from our hub, you only need minimal dependencies, namely torch
, transformers
, peft
, and optionally, bitsandbytes
if you want to quantize your model.
See requirements.txt for pip-installable dependencies with exact version pins (newer versions should work as well).
đ License
The license is inherited from the base model jeffwan/llama-7b-hf.
đ Documentation
Citation
When using or referring to this model, please cite our paper:
@article{mxm2025acip,
title={Choose Your Model Size: Any Compression by a Single Gradient Descent},
author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
year={2025},
journal={Preprint arXiv:2502.01717}
}
Information Table
Property |
Details |
Model Type |
ACIP applied to jeffwan/llama-7b-hf |
Training Data |
['allenai/c4'] |
Metrics |
['perplexity', 'accuracy'] |
Tags |
['acip', 'pytorch'] |
Base Model |
jeffwan/llama-7b-hf |
Pipeline Tag |
text-generation |
Library Name |
transformers |
License |
other |