AReaL-boba-2-8B Open-Source Inference Model - Built by Ant Group for Fast Training and Leading Performance

Areal Boba 2 8B

Developed by inclusionAI

AReaL is an asynchronous reinforcement learning training system developed by Ant Group, designed specifically for large inference models, supporting fast training and cutting-edge performance.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Asynchronous Reinforcement Learning #Large Language Model Inference #Code Generation Optimization

Downloads 1,294

Release Time : 6/3/2025

Model Overview

AReaL is a fully asynchronous reinforcement learning training system designed to help users easily build AI agents, especially good at enhancing the reasoning ability of large language models in mathematics and coding.

Model Features

Asynchronous Reinforcement Learning

Through algorithm-system co-design, support fully asynchronous reinforcement learning to achieve the fastest training speed.

Open and Reproducible

Release all code, datasets, and training recipes to ensure reproducible results.

High Scalability

Adapt to different computing resource settings, and can be seamlessly scaled from a single node to 1K GPUs.

Cutting-edge Performance

Perform excellently in mathematical and coding tasks, and support multi-round agent reinforcement learning.

Model Capabilities

Code Generation

Mathematical Reasoning

Multi-round Dialogue

Reinforcement Learning Training

Use Cases

Programming Assistance

Code Auto-completion

Help developers quickly generate code snippets and improve programming efficiency.

Reach a score of 63.0 on LiveCodeBench v5

Algorithm Competition Problem Solving

Solve programming problems on platforms such as Codeforces.

Reach a score of 1962 (97.5%) on Codeforces

Mathematical Reasoning

Mathematical Problem Solving

Solve complex mathematical problems and proofs.

🚀 AReaL: Ant Reasoning Reinforcement Learning for LLMs

AReaL is an open - source fully asynchronous reinforcement learning training system for large reasoning models. It aims to help users easily and affordably build their own AI agents, providing all necessary details for result reproduction.

🚀 Quick Start

Local Training

Train Qwen3 1.7B locally:

bash examples/run_async_ppo.sh

Evaluation

cd evaluation
# Evaluate the model
python eval_and_aggregate.py \
  --model_path ${MODEL_PATH} \
  --output_path ${OUTPUT_PATH} \
  --data_names aime24,aime25 \
  --max_gen_tokens 32768 \
  --data_names codeforces,lcb_v5 \
  --prompt_type qwen3 - think - pure \
  --temperature 1.0

✨ Features

[NEW] Asynchronous RL

With algorithm - system co - design, AReaL supports fully asynchronous RL for the fastest training! It also provides experimental support for multi - turn agentic RL.

Open & Reproducible

We continuously release all code, datasets, and training recipes for RL training of LLMs.

Scalability

AReaL can seamlessly adapt to different computational resource settings, from a single node to 1K GPUs.

Cutting - Edge Performance

AReaL can produce models with cutting - edge reasoning capabilities in math and coding. The team is also actively working on agentic tasks.

📚 Documentation

Paper
Documentation
Ask DeepWiki
[Models & Data](https://huggingface.co/collections/inclusionAI/areal - boba - 2 - 683f0e819ccb7bb2e1b2f2d5)
Code: GitHub

Release Highlights (AReaL - boba¬≤)

Fully Asynchronous RL Training Pipeline: Achieves over 2.77x speedup without performance drop. Benchmark scripts and instructions.

SOTA Code Generation Model: Based on Qwen3, achieves SOTA results on LiveCodeBench, Codeforces, and CodeContests benchmarks.

Model (8B)	LiveCodeBench v5 (2024.10 - 2025.2)	Codeforces	CodeContests
Qwen3 - 8B	58.8	1879/96.7%	31.4
DeepSeek - R1 - 0528 - Qwen3 - 8B	58.4	1945/97.3%	31.0
[AReaL - boba¬≤ - 8B - Open](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 8B - subset)	62.0	1933/97.2%	41.4
[AReaL - boba¬≤ - 8B](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 8B)	63.0	1962/97.5%	40.8

Model (14B)	LiveCodeBench v5 (2024.10 - 2025.2)	Codeforces	CodeContests
Qwen3 - 14B	65.4	1978/97.7%	38.3
DeepCoder - 14B - Preview	60.6	1936/95.3%	40.1
[AReaL - boba¬≤ - 14B - Open](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 14B - subset)	67.3	1990/97.8%	46.2
[AReaL - boba¬≤ - 14B](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 14B)	69.1	2044/98.2%	46.1

Larger Models	LiveCodeBench v5 (2024.10 - 2025.2)	Codeforces	CodeContests
Qwen3 - 235B	70.7	2056	-
DeepSeek - R1	64.3	2029	-
OpenAI - o3 - mini (Medium)	66.3	2036	-

Experimental Support for Multi - turn Agentic RL Training: Complete example.

Overview of Asynchronous RL Training

Synchronous RL training has inefficiencies due to waiting for the longest sequence in a batch. AReaL adopts a fully asynchronous RL training framework that decouples generation from training, improving training throughput and reducing GPU memory fragments.

SOTA Code Generation Model by AReaL - boba¬≤

After asynchronous RL training with Qwen3 as the base model, AReaL - boba¬≤ achieves SOTA results on multiple coding benchmarks. Key features for asynchronous training are highlighted in tutorials and code walkthroughs.

RL Training for Multi - turn Agent

AReaL - boba¬≤ allows independent customization of dataset, rollout behavior, and training algorithm without modifying system - level code. A simple example of a multi - turn math agent for RL training is provided.

🔧 Technical Details

System Design

AReaL follows a system - algorithm co - design principle. On the system side, it efficiently syncs model parameters and controls sample staleness. On the algorithm side, it improves the PPO objective for stable async - RL.

Scalability

A comparison of asynchronous RL (AReaL - boba¬≤) and classical synchronous RL (veRL) shows that AReaL has better scaling capabilities in training throughput.

📦 Resources

Quickstart

Benchmark and Reproduction

Reproduce boba¬≤ Code Models
- Model weights: [8B - code](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 8B), [14B - code](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 14B), [8B - code - open](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 8B - subset), [14B - code - open](https://huggingface.co/inclusionAI/AReaL - boba - 2 - 14B - subset)
- Evaluation Guide
- [Training configs](https://github.com/inclusionAI/AReaL/tree/main/examples/configs/v0.3 - qwen3 - code) and instructions
Scripts for Benchmark Training Throughput

Customization Guide

Use your own dataset
Modifying the reward function and rollout behavior (multi - turn agentic RL)
[Modifying PPO to GRPO](https://inclusionai.github.io/AReaL/customization/algorithm.html#grouped - advantage - normalization)
[Developing the decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#the - decoupled - ppo - loss)

System Code Walkthrough

📄 Future Plan

System Development

[x] Support for SGLang
[x] RL training with coding problems
[x] Asynchronous generation and RL training
[ ] Optimizations for distributed training: expert parallel for MOE and zero - bubble pipelining
[ ] RL for vision - language models (VLM)
[x] Multi - turn agentic RL
[ ] Function calling and tool use

Algorithm Development

[x] RL training recipes for 1.5B and 7B models
[x] A complete RL training recipe for 32B models
[ ] Sample - efficient multi - task RL algorithms
[ ] Agentic capabilities with end - to - end RL
[ ] Stable RL training for larger MOE models

📄 License

This project is licensed under the Apache - 2.0 license.

Acknowledgement

Major contributors are from the RL Lab at Ant Research and the Institute for Interdisciplinary Information Sciences, Tsinghua University. The team also thanks the Data Intelligence Lab at Ant Research and the Super Computing Technology (SCT) team at Ant Group. We appreciate pioneering works from the community, such as [ReaLHF](https://github.com/openpsi - project/ReaLHF) and others.

Citation

@inproceedings{mei2025real,
  author       = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
  title        = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},
  booktitle    = {Proceedings of the Eighth Conference on Machine Learning and Systems,
                  MLSys 2025, Santa Clara, CA, USA, May 12 - 15, 2025},
  publisher    = {mlsys.org},
  year         = {2025},
}

@misc{fu2025areal,
      title={AReaL: A Large - Scale Asynchronous Reinforcement Learning System for Language Reasoning}, 
      author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},
      year={2025},
      eprint={2505.24298},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.24298}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご