🚀 Arcee-Blitz (24B)
Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek. It is designed to be both fast and efficient, and can be regarded as a practical “workhorse” model that can handle a variety of tasks without the overhead of larger architectures.
🚀 Quick Start
This section provides a brief introduction to the model and its key features. For more detailed information, please refer to the subsequent sections.
✨ Features
- Fast and Efficient: Arcee-Blitz (24B) is designed to be both fast and efficient, making it a practical “workhorse” model for a variety of tasks.
- Improved World Knowledge: The model shows large improvements in performance on MMLU-Pro compared to the original Mistral-Small-3, indicating a significant increase in world knowledge.
- Data Contamination Checking: The training data and pipeline were carefully examined to avoid contamination, and the model is open to further community validation and testing.
📦 Installation
Quantizations of Arcee-Blitz (24B) are available in different formats:
- GGUF quants: Available here
- AWQ quants: Available here
📚 Documentation
Model Details
Property |
Details |
Architecture Base |
Mistral-Small-24B-Instruct-2501 |
Parameter Count |
24B |
Distillation Data |
Merged Virtuoso pipeline with Mistral architecture, hotstarting the training with over 3B tokens of pretraining distillation from DeepSeek-V3 logits |
Fine-Tuning and Post-Training |
After capturing core logits, additional fine-tuning and distillation steps were performed to enhance overall performance. |
License |
Apache-2.0 |
Improving World Knowledge
Arcee-Blitz demonstrates significant improvements in performance on MMLU-Pro compared to the original Mistral-Small-3, reflecting a substantial increase in world knowledge.
Data contamination checking
The training data and pipeline were carefully examined to avoid contamination. While confident in the validity of the model's performance gains, the developers remain open to further community validation and testing, which is one of the key reasons for releasing these models as open-source.
Benchmark Comparison
Benchmark |
mistral‑small‑3 |
arcee‑blitz |
MixEval |
81.6% |
85.1% |
GPQADiamond |
42.4% |
43.1% |
BigCodeBench Complete |
44.4% |
45.5% |
BigCodeBench Instruct |
34.7% |
35.9% |
BigCodeBench Complete-hard |
16.2% |
19.6% |
BigCodeBench Instruct-hard |
15.5% |
15.5% |
IFEval |
77.44 |
80.60 |
BBH |
64.46 |
65.00 |
GPQA |
33.90 |
36.70 |
MMLU Pro |
44.70 |
60.20 |
MuSR |
40.90 |
50.00 |
Math Level 5 |
12.00 |
38.60 |
Limitations
- Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
- Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.
Ethical Considerations
- Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.
📄 License
Arcee-Blitz (24B) is released under the Apache-2.0 License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license.
If you have questions or would like to share your experiences using Arcee-Blitz (24B), please connect with us on social media. We're excited to see what you build—and how this model helps you innovate!