DeepCoder-1.5B-Preview-exl2_4.65bpw Open Source Code Inference Model - Free Deployment to Enhance Code Inference Ability

Deepcoder 1.5B Preview Exl2 4.65bpw

Developed by async0x42

A code reasoning large model fine-tuned based on DeepSeek-R1-Distilled-Qwen-1.5B, utilizing distributed reinforcement learning technology to enhance long-context processing capabilities

Large Language Model

Transformers

EnglishOpen Source License:MIT #Long Code Reasoning #Reinforcement Learning Optimization #Programming Competition Level

Downloads 14

Release Time : 4/9/2025

Model Overview

DeepCoder is a large language model focused on code generation and reasoning, optimized for long-context processing through reinforcement learning techniques, suitable for programming assistance and code generation tasks.

Model Features

Reinforcement Learning Optimization

Utilizes an improved GRPO algorithm (GRPO+) and iterative context expansion techniques to enhance training stability

Long Context Processing

Supports 64K context length, excelling in long code generation tasks

High-Performance Code Generation

Outperforms the base model in programming benchmarks such as LiveCodeBench and Codeforces

Model Capabilities

Code Generation

Programming Problem Solving

Code Completion

Algorithm Implementation

Use Cases

Programming Assistance

Competitive Programming

Solving programming competition problems like Codeforces

Codeforces rating 963, percentile 28.5%

Interview Preparation

Generating solutions for programming interview questions like HumanEval+

HumanEval+ score 73.0

Education

Programming Learning

Providing code examples and explanations for learners

🚀 DeepCoder-1.5B-Preview

DeepCoder-1.5B-Preview is a code reasoning LLM that uses distributed reinforcement learning to scale up to long context lengths, democratizing reinforcement learning for LLMs.

🚀 Quick Start

This README provides an overview of DeepCoder-1.5B-Preview, including its features, data, training recipe, evaluation, serving methods, license, and acknowledgments.

✨ Features

Code Reasoning LLM: Fine - tuned from DeepSeek - R1 - Distilled - Qwen - 1.5B using distributed reinforcement learning to handle long context lengths.
Improved Training Algorithm: Based on an enhanced version of GRPO (GRPO+) and iterative context lengthening.
Good Generalization: Shows better generalization to long contexts compared to the base distilled model.
High - Performance Serving: Can be served using popular high - performance inference systems supporting the OpenAI Chat Completions API format.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

DeepCoder Overview

DeepCoder-1.5B-Preview is a code reasoning LLM fine - tuned from DeepSeek - R1 - Distilled - Qwen - 1.5B using distributed reinforcement learning (RL) to scale up to long context lengths.

Data

Our training dataset consists of approximately 24K unique problem - tests pairs compiled from:

Taco - Verified
PrimeIntellect SYNTHETIC - 1
LiveCodeBench v5 (5/1/23 - 7/31/24)

Training Recipe

Our training recipe relies on an improved version of GRPO (GRPO+) and iterative context lengthening, introduced in DeepScaleR.

GRPO+

We enhance the original GRPO algorithm with insights from DAPO to enable more stable training:

Offline Difficulty Filtering: Instead of DAPO's online dynamic sampling with high runtime overhead, we perform offline difficulty filtering on a subset of coding problems to keep the training dataset within a suitable difficulty range.
No Entropy Loss: We eliminate the entropy loss entirely as it often led to instability in training.
No KL Loss: Removing KL loss allows the LLM to move out of the trust region of the original SFT model and accelerates training by avoiding log probability computation for the reference policy.
Overlong Filtering (from DAPO): To preserve long - context reasoning, we mask the loss for truncated sequences, enabling DeepCoder to generalize to 64K - context inference despite 32K - context training.
Clip High (from DAPO): By increasing the upper bound in GRPO/PPO’s surrogate loss, we encourage more exploration and more stable entropy.

Iterative Context Lengthening

Our original Deepscaler - 1.5B - Preview scaled long context training from 8K→16K→24K, achieving 33→38→43% on AIME respectively. Similarly, Deepcoder - 14B - Preview is trained on 16K→32K, achieving 54→58% on LiveCodeBench (v5). DeepCoder - 14B - Preview successfully generalizes to longer contexts when evaluated at 64K context, reaching 60.6%.

DeepCoder generalizes better to long contexts than the base distilled model, due to DAPO's overlong filtering. However, its longer responses are often truncated when the max length is capped at 16K, which can lower its scores.

Model	16K	32K	64K
DeepCoder - 14B - Preview	45.6	57.9	60.6
DeepSeek - R1 - Distill - Qwen - 14B	50.2	53.0	53.0

A more detailed description of the training recipe can be found in our blog post.

Evaluation

We evaluate Deepcoder - 1.5B - Preview on various coding benchmarks, including LiveCodeBench (LCBv5), Codeforces, and HumanEval+.

Model	LCB (v5)(8/1/24 - 2/1/25)	Codeforces Rating	Codeforces Percentile	HumanEval+
DeepCoder - 1.5B - Preview	25.1	963	28.5	73.0
Deepseek - R1 - Distill - Qwen - 1.5B	16.9	615	1.9	58.3

Serving DeepCoder

Our model can be served using popular high - performance inference systems:

vLLM
Hugging Face Text Generation Inference (TGI)
SGLang
TensorRT - LLM

All these systems support the OpenAI Chat Completions API format.

🔧 Technical Details

The training process of DeepCoder-1.5B-Preview involves multiple technical improvements, such as the enhanced GRPO+ algorithm and iterative context lengthening. These techniques contribute to its ability to handle long - context code reasoning tasks effectively.

📄 License

This project is released under the MIT License, reflecting our commitment to open and accessible AI development. We believe in democratizing AI technology by making our work freely available for anyone to use, modify, and build upon. This permissive license ensures that researchers, developers, and enthusiasts worldwide can leverage and extend our work without restrictions, fostering innovation and collaboration in the AI community.

Acknowledgement

Our training experiments are powered by our heavily modified fork of Verl, an open - source post - training library.
Our model is trained on top of [DeepSeek - R1 - Distill - Qwen - 1.5B](https://huggingface.co/deepseek - ai/DeepSeek - R1 - Distill - Qwen - 1.5B).
Our work is done as part of Berkeley Sky Computing Lab and Berkeley AI Research.

Citation

@misc{deepcoder2025,
  title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
  author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica, Tianjun Zhang},
  howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
  note={Notion Blog},
  year={2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご