Light-R1-32B-DS Open-source Mathematical Model - Close to SOTA Level, Achieve High Performance with Minor Data Fine-tuning

Light R1 32B DS

Developed by qihoo360

Light-R1-32B-DS is a near-SOTA level 32B mathematical model, fine-tuned based on DeepSeek-R1-Distill-Qwen-32B, achieving high performance with only 3K SFT data.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Few-shot fine-tuning #Mathematical reasoning #Long-chain thinking training

Downloads 1,136

Release Time : 3/12/2025

Model Overview

This model is a high-performance large language model with 32B parameters, excelling in AIME24 and 25 tests, suitable for complex text generation tasks.

Model Features

Efficient fine-tuning

Achieves near-SOTA performance with only 3K SFT data

Strict data purification

Employs exact matching and N-gram techniques for data decontamination

High performance

Scores 78.1 and 65.9 in AIME24 and 25 tests respectively

Model Capabilities

Complex text generation

Long-chain reasoning

Mathematical problem solving

Use Cases

Academic research

Math competition problem solving

Used to solve AIME and other math competition problems

Achieved 78.1 points in AIME24 test

Educational assistance

Complex problem solving

Helps students understand complex mathematical concepts and problem-solving approaches

🚀 Light-R1-32B-DS: near-SOTA 32B Math Model with Only 3K Data

Light-R1-32B-DS is a near-SOTA 32B math model. It achieves AIME24 and AIME25 scores of 78.1 and 65.9 respectively. Originated from DeepSeek-R1-Distill-Qwen-32B, it is further trained with only 3K SFT data which we've open-sourced, demonstrating the strong applicability of the released data.

🚀 Quick Start

Same as DeepSeek-R1-Distill-Qwen-32B.

✨ Features

High Performance: Achieves near-SOTA results in math tasks with AIME24 & 25 scores 78.1 & 65.9.
Data Efficiency: Only requires 3K SFT data for further training.

📚 Documentation

Model Information

Property	Details
Base Model	deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
License	apache-2.0
Library Name	transformers
Pipeline Tag	text-generation

Performance Comparison

Model	Trained From	Release Date	AIME24	AIME25	GPQA
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	25.1.20	72.6	54.9	62.1
TinyR1-32B-Preview	DeepSeek-R1-Distill-Qwen-32B	25.2.25	77.1	65.9	65.0
Light-R1-32B-DS (ours) 🤗	DeepSeek-R1-Distill-Qwen-32B	25.3.12	78.1	65.9	68.0
Light-R1-32B (ours) 🤗	Qwen2.5-32B-Instruct	25.3.4	76.6	64.6	61.8
QwQ-32B	N/A	25.3.6	78.5	69.3	67.7

Technical Report

technical report

GitHub Page

GitHub page

Paper

🔧 Technical Details

Data Decontamination

We carefully evaluated data contamination of several open-sourced datasets. While certain contamination may be inevitable during pre-training, it is unacceptable for post-training to compare on benchmarks. MATH-500 is somewhat compromised with tens of questions that are identical or only numbers changed. AIME 24 and 25 stay intact but we have to pay special attention when we incorporate AIME data up to 2023. Light-R1 did thorough decontamination with exact matching (excluding digits) and N-gram (N=32) matching.

📄 License

This project is licensed under the apache-2.0 license.

📚 Citation

@misc{lightr1proj,
      title={Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond}, 
      author={Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang},
      year={2025},
      eprint={},
      archivePrefix={},
      url={https://github.com/Qihoo360/Light-R1}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご