🚀 DeepCoder-14B-Preview-exl2
This project is an optimized version of the DeepCoder-14B-Preview model, offering multiple quantization options for efficient use.
🚀 Quick Start
The DeepCoder-14B-Preview-exl2
is based on the original DeepCoder-14B-Preview by Agentica. It builds upon DeepSeek-R1-Distill-Qwen-14B from DeepSeek and the foundation model Qwen2.5-14B by Qwen.
✨ Features
Quantization Variants
Model Performance
DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24 - 2/1/25), representing an 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters.
Training Data
Our training dataset consists of approximately 24K unique problem-tests pairs compiled from:
- Taco-Verified
- PrimeIntellect SYNTHETIC-1
- LiveCodeBench v5 (5/1/23 - 7/31/24)
Training Recipe
- GRPO+: We enhance the original GRPO algorithm with insights from DAPO to enable more stable training.
- Iterative Context Lengthening: Our original
Deepscaler-1.5B-Preview
scaled long context training from 8K→16K→24K, achieving 33→38→43% on AIME respectively. Similarly, Deepcoder-14B-Preview
is trained on 16K→32K, achieving 54→58% on LiveCodeBench (v5). DeepCoder-14B-Preview
successfully generalizes to longer contexts when evaluated at 64K context, reaching 60.6%.
Evaluation
We evaluate Deepcoder-14B-Preview
on various coding benchmarks, including LiveCodeBench (LCBv5), Codeforces, and HumanEval+.
Model |
LCB (v5)(8/1/24 - 2/1/25) |
Codeforces Rating |
Codeforces Percentile |
HumanEval+ |
DeepCoder-14B-Preview (ours) |
60.6 |
1936 |
95.3 |
92.6 |
DeepSeek-R1-Distill-Qwen-14B |
53.0 |
1791 |
92.7 |
92.0 |
O1-2024-12-17 (Low) |
59.5 |
1991 |
96.1 |
90.8 |
O3-Mini-2025-1-31 (Low) |
60.9 |
1918 |
94.9 |
92.6 |
O1-Preview |
42.7 |
1658 |
88.5 |
89 |
Deepseek-R1 |
62.8 |
1948 |
95.4 |
92.6 |
Llama-4-Behemoth |
49.4 |
- |
- |
- |
Serving
Our model can be served using popular high-performance inference systems:
- vLLM
- Hugging Face Text Generation Inference (TGI)
- SGLang
- TensorRT-LLM
All these systems support the OpenAI Chat Completions API format.
📚 Documentation
Quantization Notes
Made with Exllamav2 0.2.8 with default dataset. It can be used with TabbyAPI, Text-Generation-WebUI and requires RTX GPU on Windows or RTX/ROCm on Linux. RAM offloading isn't supported natively, so make sure it fits your GPU VRAM. I'd recommend at least a 12GB GPU for 4 - 5bpw quants.
Usage Recommendations
Our usage recommendations are similar to those of R1 and R1 Distill series:
- Avoid adding a system prompt; all instructions should be contained within the user prompt.
temperature = 0.6
top_p = 0.95
- This model performs best with
max_tokens
set to at least 64000
License
This project is released under the MIT License, reflecting our commitment to open and accessible AI development. We believe in democratizing AI technology by making our work freely available for anyone to use, modify, and build upon. This permissive license ensures that researchers, developers, and enthusiasts worldwide can leverage and extend our work without restrictions, fostering innovation and collaboration in the AI community.
Acknowledgement
Citation
@misc{deepcoder2025,
title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica},
howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
note={Notion Blog},
year={2025}
}
💡 Usage Tip
Our usage recommendations are similar to those of R1 and R1 Distill series:
- Avoid adding a system prompt; all instructions should be contained within the user prompt.
temperature = 0.6
top_p = 0.95
- This model performs best with
max_tokens
set to at least 64000
⚠️ Important Note
RAM offloading isn't supported natively, so make sure it fits your GPU VRAM. I'd recommend at least a 12GB GPU for 4 - 5bpw quants.