acip_qwen25_7b Open Source Model - Freely Dynamically Adjust Compression Ratio while Maintaining Performance

Acip Qwen25 7b

Developed by MerantixMomentum

Compressible version of Qwen2.5-7B provided by the ACIP project, supporting dynamic compression rate adjustment while maintaining model performance

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Dynamic Compressibility #Multilingual Generation #Lossless Pruning

Downloads 80

Release Time : 4/15/2025

Model Overview

A compressible language model based on Qwen2.5-7B, utilizing ACIP technology for on-demand parameter compression, supporting multilingual text generation tasks

Model Features

Dynamic Adjustable Compression

Supports real-time compression ratio adjustment (0-100%) via the size_ratio parameter without reloading the model

Lossless Compression Recovery

Compression operations are reversible, allowing repeated evaluation of performance under different compression rates until the final compression scheme is determined

Quantization Compatibility

Supports integration with quantization tools like bitsandbytes to further reduce memory usage

Model Capabilities

Multilingual Text Generation

Model Compression

Dynamic Parameter Adjustment

Quantization Support

Use Cases

Resource Optimization

Edge Device Deployment

Deploy large language models on resource-constrained devices through compression and quantization

Can reduce memory usage by over 60%

Model Research

Compression Rate Impact Analysis

Quickly test the impact of different compression rates on model performance

Supports real-time performance comparison

🚀 ACIP applied to Qwen/Qwen2.5-7B

This model repository, part of the ACIP Project, offers a compressible version of the Qwen/Qwen2.5-7B model. For more detailed information, please visit our code repository.

[ 🤖 GitHub | 📄 Paper | 🌐 Website ]

🚀 Quick Start

Load the ACIP model using the from_pretrained method:

from transformers import AutoModel

model = AutoModel.from_pretrained("MerantixMomentum/acip_qwen25_7b", trust_remote_code=True)

This will download and create a fully parameterized ACIP model that can be pruned to any desired compression rate. For instance:

model.prune_model_by_score(size_ratio=0.4)

This will prune the model to 40% of its original size in terms of the number of parameters, which means a 60% compression rate. A unique feature of ACIP is that this operation is reversible. You can run model.prune_model_by_score multiple times to evaluate the model at different sizes. Finally, you can "commit" to a specific ratio and run:

model.compress()

This will discard all pruned mask values of compressible linear layers. Now the model is actually compressed, and you should notice a significant reduction in memory usage (this step is irreversible without reloading the ACIP model).

If you wish, you can also run:

model.quantize()

to save even more memory (we have only tested 4-bit quantization with bitsandbytes, but you can customize this).

🚀 That's it! You can now use your compressed model for inference or fine-tuning, just like any other Causal Language Model from 🤗 transformers.

⚠️ Important Note

The size_ratio parameter ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means the model has only 40% of the original number of parameters, and 1.0 means no compression. Alternatively, you can set compression_rate in prune_model_by_score, which is equivalent to size_ratio = 1.0 - compression_rate.

📦 Installation

Dependencies

To run an ACIP model from our hub, you only need minimal dependencies, namely torch, transformers, peft, and optionally, bitsandbytes if you want to quantize your model. See requirements.txt for pip-installable dependencies with exact version pins (newer versions should also work).

📄 License

This model is released under the Apache-2.0 license.

📚 Documentation

Citation

When using or referring to this model, please cite our paper:

@article{mxm2025acip,
  title={Choose Your Model Size: Any Compression by a Single Gradient Descent}, 
  author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
  year={2025},
  journal={Preprint arXiv:2502.01717}
}

📋 Model Information

Property	Details
License	Apache-2.0
Datasets	['allenai/c4']
Languages	['zho', 'eng', 'fra', 'spa', 'por', 'deu', 'ita', 'rus', 'jpn', 'kor', 'vie', 'tha', 'ara']
Metrics	['perplexity', 'accuracy']
Tags	['acip', 'pytorch']
Base Model	Qwen/Qwen2.5-7B
Pipeline Tag	text-generation
Library Name	transformers

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご