DeepSWE-Preview Open-source Coding Agent - Trained by Reinforcement Learning to Assist Software Engineering Tasks

Deepswe Preview

Developed by agentica-org

DeepSWE-Preview is a fully open-source and advanced coding intelligent agent, trained through reinforcement learning, and performs excellently in software engineering tasks.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Reinforcement learning coding #64K long context #Open-source intelligent agent

Downloads 255

Release Time : 7/1/2025

Model Overview

DeepSWE-Preview is a coding intelligent agent based on reinforcement learning, focusing on solving complex software engineering problems, supporting multi-file editing and code library management.

Model Features

Powerful reasoning ability

Demonstrates powerful reasoning ability when dealing with complex code libraries and viewing/editing multiple files.

Excellent performance

Achieved a score of 59.0% in the SWE-Bench-Verified test and ranked first in the open-source weight category.

Reinforcement learning training

Trained only through reinforcement learning, and the SWE-Bench-Verified score increased by approximately 20%.

Model Capabilities

Code generation

Code editing

Multi-file management

Bash command execution

Code search

Software engineering problem solving

Use Cases

Software development

Code patch generation

Generate and pass tested code patches to solve software engineering problems.

Achieved an accuracy of 59.0% in the SWE-Bench-Verified test.

Multi-file code editing

View and edit multiple code files simultaneously to solve complex problems.

Demonstrates powerful reasoning ability in complex code libraries.

🚀 DeepSWE-Preview

Democratizing Reinforcement Learning for LLM Agents

🚀 Quick Start

DeepSWE-Preview is a fully open-sourced, state-of-the-art coding agent trained with only reinforcement learning (RL) to excel at software engineering (SWE) tasks. It demonstrates strong reasoning capabilities in navigating complex codebases and viewing/editing multiple files, and serves as a foundational model for future coding agents. The model achieves an impressive 59.0% on SWE-Bench-Verified, which is currently #1 in the open-weights category.

✨ Features

High Performance: Achieves 59.0% on SWE-Bench-Verified, leading in the open-weights category.
Strong Reasoning: Capable of handling complex codebases and multi - file operations.
Open - Source: Released under the MIT License, promoting open and accessible AI development.

💡 Usage Tip

To get the best performance out of DeepSWE-Preview, we suggest setting:

Temperature = 1

Max tokens set to at least 32 - 64K.

Use R2EGym's system/instance prompt and tools (file_editor.py, execution_bash.py, search.py, finish.py). See here for more details.

📦 Installation

The README does not provide specific installation steps, so this section is skipped.

💻 Usage Examples

The README does not provide code examples, so this section is skipped.

📚 Documentation

Training Recipe

Data

Our dataset contains 4.5K problems from a subset of R2E-Gym. To avoid data contamination during training, we filtered out problems that are derived from the same repositories as SWE-Bench-Verified, such as sympy. All problems map to individual Docker images.

Environment

Our environment wraps around R2E-Gym, an existing Gym environment for scalable curation of high-quality executable SWE environments.

State & Action: R2E-Gym defines a set of four tools as part of the action space. The output of each tool (a Python program with stdout/stderr) represents the returned state. More specifically:

Execute Bash: Outputs both stdout and stderr of an LLM-generated bash command.
Search: Searches and returns all occurrences of an LLM-defined query in either a directory or a single file.
File Editor: Allows for viewing, creating, replacing strings, inserting, and undoing edits to a specific file.
Finish/Submit: LLM has decided that it has resolved the pull request, which terminates trajectory generation.

Reward: To keep things simple, our reward function employs a sparse Outcome Reward Model (ORM):

1: LLM’s generated patch passes a selected sample of tests (Pass2Pass and Fail2Pass) within a time limit. To accelerate training, our max time limit is 5 minutes, while the official SWE-Bench evaluation is 30 minutes.
0: We assign no reward if the LLM’s code fails on at least one test case or times out.

RL Algorithm

We enhance the original GRPO algorithm, integrating insights from DAPO, Dr. GRPO, LOOP/RLOO, and our innovations to enable stable training and improved performance. Our final, amalgamate algorithm consists of:

Clip High (DAPO): Increasing the upper bound of GRPO/PPO’s surrogate loss encourages exploration and stabilizes entropy.
No KL Loss (DAPO): Eliminating KL loss prevents the LLM from being constrained to the trust region of the original SFT model.
No Reward Standard Deviation (Dr.GRPO): Removing reward standard deviation removes difficulty bias in GRPO’s loss, ensuring hard and easy problems are better differentiated.
Length Normalization (Dr.GRPO): Dividing surrogate loss by max context length removes length bias present in GRPO, which increases the length of incorrect responses.
Leave One Out (Loop/RLOO): Removing one sample for advantage estimation reduces variance for policy gradient without introducing bias.
Compact Filtering (Us): Inspired by DAPO, we mask the loss for trajectories that reach max context length, timeout during generation (20 minutes), or reach maximum steps.
No Entropy Loss (Us): Entropy loss introduces higher instability and eventually leads to exponentially increasing entropy, which collapses training. Provided that the base model’s token-level entropy is within 0.3 - 1, entropy loss is not needed.

A more detailed description of the training recipe can be found in our blog post.

Evaluation

DeepSWE-Preview is evaluated via the official R2E-Gym codebase at 64k max context length and 100 max environment steps. DeepSWE's generated patches are then ported over to the offical SWE-bench repo to calculate final score. Below, We report Pass@1 accuracy averaged over 16 runs.

Model	Scaffold	Type	SWE-Bench Verified (%)
DeepSWE-Preview (32B)	R2E-Gym	Agent + Hybrid Best@16	59%
DeepSWE-Preview (32B)	R2E-Gym	Agent + Hybrid Best@8	57.9%
DeepSWE-Preview (32B)	R2E-Gym	Agent	42.2%
Devstral-Small (24B)	OpenHands	Agent	46.6%
Openhands-LM (32B)	OpenHands	Agent (Iterative)	37.2%
SWE-Agent-LM (32B)	SWE-Agent	Agent	40.2%
R2EGym-Agent (32B)	R2E-Gym	Agent	34.4%
Skywork-SWE (32B)	OpenHands	Agent	38.0%
Skywork-SWE (32B)	OpenHands	Agent + Execution-Free Best@8	47.0%
SkyRL-Agent (14B)	OpenHands	Agent	21.6%

Test-time Scaling

With hybrid TTS, DeepSWE-Preview achieves 59%, beating the current SOTA open-weights model (SkyWork + TTS, 47%) by 12%. Only using execution-based and execution-free verifiers is still effective and can bring 10+% performance.

Serving DeepSWE-Preview

Our model can be served using popular high-performance inference systems:

vLLM
Hugging Face Text Generation Inference (TGI)
SGLang
TensorRT-LLM

All these systems support the OpenAI Chat Completions API format.

vLLM (Recommended)

We suggest using vllm>=0.8.5 and enabling long context in VLLM to serve DeepSWE-Preview.

export MAX_CONTEXT_LEN=65536
export TENSOR_PARALLEL_SIZE=8
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve agentica-org/DeepSWE-Preview --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_CONTEXT_LEN   --hf-overrides '{\"max_position_embeddings\": $MAX_CONTEXT_LEN}' --enable_prefix_caching

🔧 Technical Details

The README does not provide in - depth technical details that meet the criteria, so this section is skipped.

📄 License

This project is released under the MIT License, reflecting our commitment to open and accessible AI development. We believe in democratizing AI technology by making our work freely available for anyone to use, modify, and build upon. This permissive license ensures that researchers, developers, and enthusiasts worldwide can leverage and extend our work without restrictions, fostering innovation and collaboration in the AI community.

Acknowledgement

Our training experiments are powered by rLLM, which builds on top of Verl, an open-source RLHF library.
Our model is trained on top of Qwen/Qwen3-32B.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご