Acip_qwen25_3b Open-source Model - A Free AI Model with Dynamic Size Adjustment and Performance Preservation

Acip Qwen25 3b

Developed by MerantixMomentum

Compressible version of Qwen2.5-3B provided by the ACIP project, supporting dynamic model size adjustment while maintaining performance

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Dynamic Compressibility #Multilingual Text Generation #Lossless Compression

Downloads 31

Release Time : 4/15/2025

Model Overview

Compressible model based on Qwen2.5-3B, achieving flexible parameter compression and quantization through ACIP technology, suitable for multilingual text generation tasks

Model Features

Dynamic Compressibility

Supports real-time adjustment of model compression ratio (0-100%) via the size_ratio parameter, with reversible compression operations

Quantization Support

Integrates bitsandbytes' 4-bit quantization scheme to further reduce GPU memory usage

Multilingual Support

Natively supports text generation tasks in 13 languages

Model Capabilities

Text Generation

Model Compression

Quantized Inference

Use Cases

Resource-Constrained Deployment

Edge Device Deployment

Deploy large models to devices with limited GPU memory through compression and quantization

Can reduce GPU memory usage by over 60%

Multilingual Applications

Multilingual Text Generation

Supports text generation and creation in 13 languages

🚀 ACIP applied to Qwen/Qwen2.5-3B

This model repository is part of the ACIP Project, offering a compressible version of the Qwen/Qwen2.5-3B model, enabling flexible model compression.

[ 🤖 GitHub | 📄 Paper | 🌐 Website ]

🚀 Quick Start

Just load the ACIP model via from_pretrained:

from transformers import AutoModel

model = AutoModel.from_pretrained("MerantixMomentum/acip_qwen25_3b", trust_remote_code=True)

This will download and create a fully parameterized ACIP model that can be pruned to any compression rate you wish. For example,

model.prune_model_by_score(size_ratio=0.4)

will prune model to 40% if its original size measured in number of parameters, i.e., 60% compression rate. A unique feature of ACIP is that this operation is revertible in the sense that you can rerun model.prune_model_by_score as often as you like to evaluate your model at different sizes. Finally, you can "commit" to a certain ratio and run

model.compress()

which will discard all pruned mask values of compressible linear layers. Now the model is actually compressed and you should observe a significant decrease of memory usage (this step is not revertible without reloading the ACIP model). If you like, you can also run

model.quantize()

to save even more memory (we have only tested 4bit quantization with bitsandbytes, but you could also customize this).

🚀 That's it! You can now use your compressed model for inference or fine-tuning as any other Causal Language Model from 🤗 transformers.

⚠️ Important Note

The parameter size_ratio ranges from 1.0 to 0.0, indicating the model size after compression. For example, 0.4 means that the model has only 40% of the original number of parameters and 1.0 means no compression at all. Alternatively, you can also set compression_rate in prune_model_by_score, which is equivalent to size_ratio = 1.0 - compression_rate.

📦 Installation

To run an ACIP model from our hub, you only need minimal dependencies, namely torch, transformers, peft, and optionally, bitsandbytes in case you want to quantize your model. See requirements.txt for pip-installable dependencies with exact version pins (newer version should work as well).

📄 License

This model is released under the apache-2.0 license.

📚 Documentation

When using or referring to this model, please cite our paper:

@article{mxm2025acip,
  title={Choose Your Model Size: Any Compression by a Single Gradient Descent}, 
  author={M. Genzel, P. Putzky, P. Zhao, S. Schulze, M. Mollenhauer, R. Seidel, S. Dietzel, T. Wollmann},
  year={2025},
  journal={Preprint arXiv:2502.01717}
}

📋 Information Table

Property	Details
Model Type	ACIP applied to Qwen/Qwen2.5-3B
Training Datasets	['allenai/c4']
Supported Languages	['zho', 'eng', 'fra', 'spa', 'por', 'deu', 'ita', 'rus', 'jpn', 'kor', 'vie', 'tha', 'ara']
Evaluation Metrics	['perplexity', 'accuracy']
Tags	['acip', 'pytorch']
Base Model	Qwen/Qwen2.5-3B
Pipeline Tag	text-generation
Library Name	transformers
License	apache-2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご