L

Light R1 14B DS

Developed by qihoo360
Light-R1-14B-DS is a 14B-parameter math SOTA model trained with reinforcement learning, excelling in AIME24/25 and GPQA benchmarks.
Downloads 2,890
Release Time : 3/12/2025

Model Overview

This is a reinforcement learning model based on DeepSeek-R1-Distill-Qwen-14B, specializing in mathematical reasoning and long-chain thinking tasks, setting new records for 14B-parameter models across multiple math benchmarks.

Model Features

Reinforcement Learning with Lightweight Computing Power
Successfully implemented reinforcement learning on medium-scale models without requiring massive computing resources.
Long-Chain Thinking Capability
Observed synchronized improvement in response length and reward scores on fine-tuned models already equipped with long-chain thinking abilities.
Math Reasoning SOTA
Achieved breakthrough scores of 74.0 and 60.2 in the AIME24/25 benchmarks, respectively.
Data Purification
Employed strict data contamination detection using exact matching and N-gram matching.

Model Capabilities

Mathematical Reasoning
Long-Chain Task Processing
Complex Problem Solving
Text Generation

Use Cases

Education
Math Competition Problem Solving
Used to solve math competition problems such as AIME.
Performed excellently in the AIME24/25 benchmarks.
Complex Math Problem Solving
Solves complex math problems requiring long-chain reasoning.
Performed well on the GPQA benchmark without specialized training.
Research
Reinforcement Learning Research
Serves as a case study for reinforcement learning in medium-scale models.
First observation of ideal phenomena on fine-tuned models already equipped with long-chain thinking capabilities.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase