Light-R1-14B-DS Open-source Mathematical Model - Free Deployment Helps Solve Various Mathematical Problems

Light R1 14B DS

Developed by qihoo360

Light-R1-14B-DS is a 14B-parameter math SOTA model trained with reinforcement learning, excelling in AIME24/25 and GPQA benchmarks.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Math Reasoning SOTA #Reinforcement Learning Optimization #Long-Chain Thinking

Downloads 2,890

Release Time : 3/12/2025

Model Overview

This is a reinforcement learning model based on DeepSeek-R1-Distill-Qwen-14B, specializing in mathematical reasoning and long-chain thinking tasks, setting new records for 14B-parameter models across multiple math benchmarks.

Model Features

Reinforcement Learning with Lightweight Computing Power

Successfully implemented reinforcement learning on medium-scale models without requiring massive computing resources.

Long-Chain Thinking Capability

Observed synchronized improvement in response length and reward scores on fine-tuned models already equipped with long-chain thinking abilities.

Math Reasoning SOTA

Achieved breakthrough scores of 74.0 and 60.2 in the AIME24/25 benchmarks, respectively.

Data Purification

Employed strict data contamination detection using exact matching and N-gram matching.

Model Capabilities

Mathematical Reasoning

Long-Chain Task Processing

Complex Problem Solving

Text Generation

Use Cases

Education

Math Competition Problem Solving

Used to solve math competition problems such as AIME.

Performed excellently in the AIME24/25 benchmarks.

Complex Math Problem Solving

Solves complex math problems requiring long-chain reasoning.

Performed well on the GPQA benchmark without specialized training.

Research

Reinforcement Learning Research

Serves as a case study for reinforcement learning in medium-scale models.

First observation of ideal phenomena on fine-tuned models already equipped with long-chain thinking capabilities.

🚀 Light-R1-14B-DS: SOTA 14B Math Model with RL

Light-R1-14B-DS is the first open - source successful RL attempt on long - COT finetuned models of similar sizes under a light budget. It is also the State - Of - The - Art 14B math model, outperforming many 32B models in AIME24 and AIME25 scores.

Property	Details
Base Model	deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B
License	apache - 2.0
Pipeline Tag	text - generation
Library Name	transformers

🚀 Quick Start

Same as DeepSeek - R1 - Distill - Qwen - 14B.

✨ Features

First Open - source RL on Long - COT Finetuned Models: Light - R1 - 14B - DS is the first open - source successful RL attempt on already long - COT finetuned models of similar sizes under a light budget.
State - Of - The - Art Performance: It achieves the best performance among 14B math models, with AIME24 and AIME25 scores of 74.0 and 60.2 respectively, outperforming many 32B models.
Expected Behavior in RL Training: During RL training, there is a simultaneous increase in response length and reward score on an already long - COT finetuned model.
Good Performance without Specific Training: It performs well on GPQA without any specific training.

📊 Model Comparison

Model	Trained From	Release Date	AIME24	AIME25	GPQA
OpenThinker - 32B	Qwen2.5 - 32B - Instruct	25.2.12	66.0	50.9	61.6
DeepSeek - R1 - Distill - Qwen - 14B	Qwen2.5 - 14B	25.1.20	69.7	50.2	59.1
[Light - R1 - 14B - DS (ours) 🤗](https://huggingface.co/qihoo360/Light - R1 - 14B - DS)	DeepSeek - R1 - Distill - Qwen - 14B	25.3.12	74.0	60.2	61.7
[Light - R1 - 32B (ours) 🤗](https://huggingface.co/qihoo360/Light - R1 - 32B)	Qwen2.5 - 32B - Instruct	25.3.4	76.6	64.6	61.8

📚 Documentation

Model Origin and Improvement

Originated from DeepSeek - R1 - Distill - Qwen - 14B, Light - R1 - 14B - DS underwent long - COT RL Post - Training and achieved a new State - Of - The - Art across 14B - Math models.

Data Decontamination

We carefully evaluated data contamination of several open - sourced datasets. While certain contamination may be inevitable during pre - training, it is unacceptable for post - training to compare on benchmarks. MATH - 500 is somewhat compromised with tens of questions that are identical or only numbers changed. AIME 24 and 25 stay intact but we have to pay special attention when we incorporate AIME data up to 2023. Light - R1 did thorough decontamination with exact matching (excluding digits) and N - gram (N = 32) matching.

📄 License

This project is licensed under the apache - 2.0 license.

📖 Citation

@misc{lightr1proj,
      title={Light - R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond}, 
      author={Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang},
      year={2025},
      eprint={},
      archivePrefix={},
      url={https://github.com/Qihoo360/Light - R1}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご