AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Mathematical Reasoning Reinforcement Learning

# Mathematical Reasoning Reinforcement Learning

Nano Aha Moment 3b
A 3-billion-parameter language model trained with reinforcement learning for solving mathematical reasoning tasks, especially countdown games.
Large Language Model Transformers
N
McGill-NLP
55
2
OREAL 32B SFT
Apache-2.0
OREAL-32B-SFT is a supervised fine-tuned model based on Qwen2.5-32B, specifically designed for mathematical reasoning tasks, serving as the initial policy model for the OREA reinforcement learning framework.
Large Language Model Transformers
O
internlm
18
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase