đ MiMo-7B: Unlocking the Reasoning Potential of Language Model
This project presents the MiMo-7B series of models, which are trained from scratch and designed for reasoning tasks. These models show extraordinary reasoning potential and superior performance in mathematics and code reasoning tasks.
đ Quick Start
This README provides an overview of the MiMo-7B series of models, including their pre - training and post - training strategies, evaluation results, and deployment methods. You can access the models on HuggingFace and ModelScope, and follow the deployment instructions to use them.
⨠Features
Pre - Training: Base Model Born for Reasoning
- Optimize the data preprocessing pipeline, enhance text extraction toolkits, and apply multi - dimensional data filtering to increase reasoning pattern density in pre - training data. Generate massive diverse synthetic reasoning data using multiple strategies.
- Adopt a three - stage data mixture strategy for pre - training. The MiMo - 7B - Base is pre - trained on approximately 25 trillion tokens.
- Incorporate Multiple - Token Prediction as an additional training objective to enhance model performance and accelerate inference.
Post - Training Recipe: Pioneering Reasoning Model
- Curate 130K mathematics and code problems as RL training data, verified by rule - based verifiers. Each problem is carefully cleaned and its difficulty is assessed. Only rule - based accuracy rewards are used to avoid potential reward hacking.
- Introduce a test difficulty driven code reward to mitigate the sparse reward issue for challenging code problems. Assign fine - grained scores for test cases with varying difficulty levels to optimize the policy effectively.
- Implement a data re - sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, especially in the later phases of RL training.
RL Infrastructure
- Develop a Seamless Rollout Engine to accelerate RL training and validation. Integrate continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
- Support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
đĻ Installation
SGLang Inference
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL-Zero --host 0.0.0.0 --trust-remote-code
vLLM inference
- [Recommended] Install our fork of vLLM:
git clone https://github.com/XiaomiMiMo/vllm/tree/feat_mimo_mtp_stable_073
- For registering a vLLM loader without MTP parameters:
git clone https://github.com/XiaomiMiMo/MiMo.git
HuggingFace inference
pip install transformers
đģ Usage Examples
SGLang Inference
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL-Zero --host 0.0.0.0 --trust-remote-code
Detailed usage can be found in SGLang documents. MTP will also be supported in 24h.
vLLM inference
Basic Usage
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model=model_path,
trust_remote_code=True,
num_speculative_tokens=1,
disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)
conversation = [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": "Write an essay about the importance of higher education.",
},
]
outputs = llm.chat(conversation,
sampling_params=sampling_params,
use_tqdm=False)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
print("=" * 80)
Advanced Usage
import register_mimo_in_vllm
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model=model_path,
trust_remote_code=True,
disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)
HuggingFace inference
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
model_id = "XiaomiMiMo/MiMo-7B-RL-Zero"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(*
đ Documentation
Model Details
The MTP layers of MiMo - 7B are tuned during pretraining and SFT and frozen during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
Model |
Description |
Download (HuggingFace) |
Download (ModelScope) |
MiMo - 7B - Base |
Base model with extraordinary reasoning potential |
[XiaomiMiMo/MiMo - 7B - Base](https://huggingface.co/XiaomiMiMo/MiMo - 7B - Base) |
[XiaomiMiMo/MiMo - 7B - Base](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - Base) |
MiMo - 7B - RL - Zero |
RL model trained from base model |
[XiaomiMiMo/MiMo - 7B - RL - Zero](https://huggingface.co/XiaomiMiMo/MiMo - 7B - RL - Zero) |
[XiaomiMiMo/MiMo - 7B - RL - Zero](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - RL - Zero) |
MiMo - 7B - SFT |
SFT model trained from base model |
[XiaomiMiMo/MiMo - 7B - SFT](https://huggingface.co/XiaomiMiMo/MiMo - 7B - SFT) |
[XiaomiMiMo/MiMo - 7B - SFT](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - SFT) |
MiMo - 7B - RL |
RL model trained from SFT model, superior performance matching OpenAI o1 - mini |
[XiaomiMiMo/MiMo - 7B - RL](https://huggingface.co/XiaomiMiMo/MiMo - 7B - RL) |
[XiaomiMiMo/MiMo - 7B - RL](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - RL) |
Evaluation Results
Benchmark |
GPT - 4o - 0513 |
Claude - 3.5 - Sonnet - 1022 |
OpenAI o1 - mini |
QwQ - 32B - Preview |
R1 - Distill - Qwen - 14B |
R1 - Distill - Qwen - 7B |
MiMo - 7B - RL |
General |
|
|
|
|
|
|
|
GPQA Diamond (Pass@1) |
49.9 |
65.0 |
60.0 |
54.5 |
59.1 |
49.1 |
54.4 |
SuperGPQA (Pass@1) |
42.4 |
48.2 |
45.2 |
43.6 |
40.6 |
28.9 |
40.5 |
DROP (3 - shot F1) |
83.7 |
88.3 |
83.9 |
71.2 |
85.5 |
77.0 |
78.7 |
MMLU - Pro (EM) |
72.6 |
78.0 |
80.3 |
52.0 |
68.8 |
53.5 |
58.6 |
IF - Eval (Prompt Strict) |
84.3 |
86.5 |
84.8 |
40.4 |
78.3 |
60.5 |
61.0 |
Mathematics |
|
|
|
|
|
|
|
MATH - 500 (Pass@1) |
74.6 |
78.3 |
90.0 |
90.6 |
93.9 |
92.8 |
95.8 |
AIME 2024 (Pass@1) |
9.3 |
16.0 |
63.6 |
50.0 |
69.7 |
55.5 |
68.2 |
AIME 2025 (Pass@1) |
11.6 |
7.4 |
50.7 |
32.4 |
48.2 |
38.8 |
55.4 |
Code |
|
|
|
|
|
|
|
LiveCodeBench v5 (Pass@1) |
32.9 |
38.9 |
53.8 |
41.9 |
53.1 |
37.6 |
57.8 |
LiveCodeBench v6 (Pass@1) |
30.9 |
37.2 |
46.8 |
39.1 |
31.9 |
23.9 |
49.3 |
MiMo - 7B series
Benchmark |
MiMo - 7B - Base |
MiMo - 7B - RL - Zero |
MiMo - 7B - SFT |
MiMo - 7B - RL |
Mathematics |
|
|
|
|
MATH500 (Pass@1) |
37.4 |
93.6 |
93.0 |
95.8 |
AIME 2024 (Pass@1) |
32.9 |
56.4 |
58.7 |
68.2 |
AIME 2025 (Pass@1) |
24.3 |
46.3 |
44.3 |
55.4 |
Code |
|
|
|
|
LiveCodeBench v5 (Pass@1) |
32.9 |
49.1 |
52.3 |
57.8 |
LiveCodeBench v6 (Pass@1) |
29.1 |
42.9 |
45.5 |
49.3 |
â ī¸ Important Note
The evaluations are conducted with temperature = 0.6
.
AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801 - 20250201), LiveCodeBench v6 (20250201 - 20250501), GPQA - Diamond and IF - Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
đ License
This model repository is licensed under the MIT License.