đ Light-R1-32B-DS: near-SOTA 32B Math Model with Only 3K Data
Light-R1-32B-DS is a near-SOTA 32B math model. It achieves AIME24 and AIME25 scores of 78.1 and 65.9 respectively. Originated from DeepSeek-R1-Distill-Qwen-32B, it is further trained with only 3K SFT data which we've open-sourced, demonstrating the strong applicability of the released data.
đ Quick Start
Same as DeepSeek-R1-Distill-Qwen-32B.
⨠Features
- High Performance: Achieves near-SOTA results in math tasks with AIME24 & 25 scores 78.1 & 65.9.
- Data Efficiency: Only requires 3K SFT data for further training.
đ Documentation
Model Information
Property |
Details |
Base Model |
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
License |
apache-2.0 |
Library Name |
transformers |
Pipeline Tag |
text-generation |
Performance Comparison
Model |
Trained From |
Release Date |
AIME24 |
AIME25 |
GPQA |
DeepSeek-R1-Distill-Qwen-32B |
Qwen2.5-32B |
25.1.20 |
72.6 |
54.9 |
62.1 |
TinyR1-32B-Preview |
DeepSeek-R1-Distill-Qwen-32B |
25.2.25 |
77.1 |
65.9 |
65.0 |
Light-R1-32B-DS (ours) đ¤ |
DeepSeek-R1-Distill-Qwen-32B |
25.3.12 |
78.1 |
65.9 |
68.0 |
Light-R1-32B (ours) đ¤ |
Qwen2.5-32B-Instruct |
25.3.4 |
76.6 |
64.6 |
61.8 |
QwQ-32B |
N/A |
25.3.6 |
78.5 |
69.3 |
67.7 |
Technical Report
technical report
GitHub Page
GitHub page
Paper
Paper
đ§ Technical Details
Data Decontamination
We carefully evaluated data contamination of several open-sourced datasets. While certain contamination may be inevitable during pre-training, it is unacceptable for post-training to compare on benchmarks. MATH-500 is somewhat compromised with tens of questions that are identical or only numbers changed. AIME 24 and 25 stay intact but we have to pay special attention when we incorporate AIME data up to 2023. Light-R1 did thorough decontamination with exact matching (excluding digits) and N-gram (N=32) matching.
đ License
This project is licensed under the apache-2.0 license.
đ Citation
@misc{lightr1proj,
title={Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond},
author={Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang},
year={2025},
eprint={},
archivePrefix={},
url={https://github.com/Qihoo360/Light-R1},
}