O

Openhands Critic 32b Exp 20250417

Developed by all-hands
A review model fine-tuned based on Qwen2.5-Coder-32B-Instruct for evaluating code solution quality, helping achieve SOTA results on the SWE-Bench benchmark
Downloads 194
Release Time : 4/16/2025

Model Overview

A review model specifically designed for software engineering tasks, assessing code patch quality through temporal difference learning objectives and supporting multi-trajectory optimal selection

Model Features

Inference-Time Scaling Optimization
By generating multiple solutions and selecting the best one, SWE-Bench performance improved from 60.6% to 66.4%
Temporal Difference Learning
Uses TD learning objectives to backpropagate unit test signals across the entire trajectory for precise reward prediction
Real-World Generalization
Compared to prompt engineering solutions, the trained review model can generalize to software engineering scenarios beyond SWE-Bench

Model Capabilities

Code Quality Assessment
Multi-Solution Selection
Software Issue Resolution
Unit Test Pass Rate Prediction

Use Cases

Software Development Assistance
SWE-Bench Issue Resolution
Evaluates the quality of code patches for real GitHub issues
Achieved a 66.4% pass rate on the SWE-Bench Verified benchmark
Programming Agent Optimization
Provides intermediate reward signals for OpenHands agents
Supports real-time error recovery and single-step lookahead sampling
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase