đ ACIP applied to Qwen/Qwen2.5-7B
This model repository, part of the ACIP Project, offers a compressible version of the Qwen/Qwen2.5-7B
model. For more detailed information, please visit our code repository.
[
đ¤ GitHub |
đ Paper |
đ Website
]
đ Quick Start
Load the ACIP model using the from_pretrained
method:
from transformers import AutoModel
model = AutoModel.from_pretrained("MerantixMomentum/acip_qwen25_7b", trust_remote_code=True)
This will download and create a fully parameterized ACIP model that can be pruned to any desired compression rate. For instance:
model.prune_model_by_score(size_ratio=0.4)
This will prune the model
to 40% of its original size in terms of the number of parameters, which means a 60% compression rate. A unique feature of ACIP is that this operation is reversible. You can run model.prune_model_by_score
multiple times to evaluate the model at different sizes. Finally, you can "commit" to a specific ratio and run:
model.compress()
This will discard all pruned mask values of compressible linear layers. Now the model is actually compressed, and you should notice a significant reduction in memory usage (this step is irreversible without reloading the ACIP model).
If you wish, you can also run:
model.quantize()
to save even more memory (we have only tested 4-bit quantization with bitsandbytes
, but you can customize this).
đ That's it! You can now use your compressed model for inference or fine-tuning, just like any other Causal Language Model from đ¤ transformers.
â ī¸ Important Note
The size_ratio
parameter ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means the model has only 40% of the original number of parameters, and 1.0 means no compression. Alternatively, you can set compression_rate
in prune_model_by_score
, which is equivalent to size_ratio = 1.0 - compression_rate
.
đĻ Installation
Dependencies
To run an ACIP model from our hub, you only need minimal dependencies, namely torch
, transformers
, peft
, and optionally, bitsandbytes
if you want to quantize your model.
See requirements.txt for pip-installable dependencies with exact version pins (newer versions should also work).
đ License
This model is released under the Apache-2.0 license.
đ Documentation
Citation
When using or referring to this model, please cite our paper:
@article{mxm2025acip,
title={Choose Your Model Size: Any Compression by a Single Gradient Descent},
author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
year={2025},
journal={Preprint arXiv:2502.01717}
}
đ Model Information
Property |
Details |
License |
Apache-2.0 |
Datasets |
['allenai/c4'] |
Languages |
['zho', 'eng', 'fra', 'spa', 'por', 'deu', 'ita', 'rus', 'jpn', 'kor', 'vie', 'tha', 'ara'] |
Metrics |
['perplexity', 'accuracy'] |
Tags |
['acip', 'pytorch'] |
Base Model |
Qwen/Qwen2.5-7B |
Pipeline Tag |
text-generation |
Library Name |
transformers |