M

Mimo 7B RL

Developed by XiaomiMiMo
MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.
Downloads 11.79k
Release Time : 4/29/2025

Model Overview

A 7B-parameter language model optimized for reasoning tasks, showcasing exceptional performance in mathematics and programming through reinforcement learning training

Model Features

Inference-Optimized Pretraining
Adopts a three-stage data mixing strategy and diverse synthetic reasoning data for pretraining to enhance model reasoning capabilities
Multi-Token Prediction
Introduces MTP as an auxiliary training objective to improve model performance and accelerate inference
Test Difficulty-Driven Rewards
Designs fine-grained reward mechanisms for high-difficulty coding problems to optimize dense reward signals

Model Capabilities

Mathematical problem solving
Code generation and completion
Logical reasoning
Complex problem decomposition

Use Cases

Education
Math competition problem solving
Solving AIME and other math competition problems
Achieved 68.2%/55.4% accuracy in AIME 2024/2025 tests
Programming
Programming problem solving
Solving programming problems on LiveCodeBench
Achieved 57.8%/49.3% accuracy on LiveCodeBench v5/v6
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase