Model Overview
Model Features
Model Capabilities
Use Cases
đ MiMo: Unlocking the Reasoning Potential of Language Model
This project aims to unlock the reasoning potential of language models from pretraining to posttraining, offering high - performance models for mathematics and code reasoning tasks.
đ Quick Start
The MiMo-7B series models are now open - source. You can download the checkpoints of the base model, SFT model, RL model trained from the base model, and RL model trained from the SFT model. For detailed deployment methods, please refer to the "Deployment" section below.
⨠Features
Highlights
-
Pre - Training: Base Model Born for Reasoning
- Optimize the data preprocessing pipeline, enhance text extraction toolkits, and apply multi - dimensional data filtering to increase the density of reasoning patterns in pre - training data. Use multiple strategies to generate a large amount of diverse synthetic reasoning data.
- Adopt a three - stage data mixture strategy for pre - training. The MiMo - 7B - Base is pre - trained on approximately 25 trillion tokens.
- Incorporate Multiple - Token Prediction as an additional training objective to enhance model performance and accelerate inference.
-
Post - Training Recipe: Pioneering Reasoning Model
- Curate 130K mathematics and code problems as RL training data, which can be verified by rule - based verifiers. Each problem is carefully cleaned and its difficulty is assessed to ensure quality. Only rule - based accuracy rewards are used to avoid potential reward hacking.
- Introduce a test difficulty driven code reward to mitigate the sparse reward issue for challenging code problems. By assigning fine - grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signals.
- Implement a data re - sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, especially in the later phases of RL training.
-
RL Infrastructure
- Develop a Seamless Rollout Engine to accelerate RL training and validation. The design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
- Support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
đĻ Installation
This section mainly focuses on the deployment of the model, not the traditional installation steps. You can choose different inference methods according to your needs:
SGLang Inference
Thanks to the contribution from the SGLang team, we supported MiMo in SGLang mainstream within 24h with MTP coming soon.
Example Script
# Install the latest SGlang from main branch
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
# Launch SGLang Server
python3 -m sglang.launch_server --model - path XiaomiMiMo/MiMo - 7B - RL --host 0.0.0.0 --trust - remote - code
Detailed usage can be found in SGLang documents. MTP will also be supported in 24h.
vLLM inference
- [Recommended] We officially support inference with MiMo - MTP using our fork of vLLM.
Example script
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model = model_path,
trust_remote_code = True,
num_speculative_tokens = 1,
disable_log_stats = False
)
sampling_params = SamplingParams(temperature = 0.6)
conversation = [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": "Write an essay about the importance of higher education.",
},
]
outputs = llm.chat(conversation,
sampling_params = sampling_params,
use_tqdm = False)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
print("=" * 80)
- Or, you can register a vLLM loader for MiMo without loading MTP parameters.
You can copy the registry/register_mimo_in_vllm.py
to your directory and import it with
import register_mimo_in_vllm
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model = model_path,
trust_remote_code = True,
# num_speculative_tokens = 1,
disable_log_stats = False
)
sampling_params = SamplingParams(temperature = 0.6)
HuggingFace inference
Example script
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
model_id = "XiaomiMiMo/MiMo - 7B - RL - 0530"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors = 'pt')
output = model.generate(**inputs, max_new_tokens = 100)
print(tokenizer.decode(output.tolist()[0]))
đ Documentation
Updates
[2025.05.30] During the RL training, by continuously expanding the training window size (from 32K to 48K), the performance of MiMo - 7B - RL - 0530 on AIME24 can be continuously improved and eventually surpass that of DeepSeek R1.
Benchmark | MiMo - 7B - RL | MiMo - 7B - RL - 0530 |
---|---|---|
Mathematics | ||
MATH500 (Pass@1) |
95.8 | 97.2 |
AIME 2024 (Pass@1) |
68.2 | 80.1 |
AIME 2025 (Pass@1) |
55.4 | 70.2 |
Code | ||
LiveCodeBench v5 (Pass@1) |
57.8 | 60.9 |
LiveCodeBench v6 (Pass@1) |
49.3 | 52.2 |
STEM | ||
GPQA - Diamond (Pass@1) |
54.4 | 60.6 |
General | ||
Alignbench1.1 (Evaluated by GPT4.1) |
6.9 | 7.4 |
Model Details
The MTP layers of MiMo - 7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
Model | Description | Download (HuggingFace) | Download (ModelScope) |
---|---|---|---|
MiMo - 7B - Base | Base model with extraordinary reasoning potential | [HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo - 7B - Base) | [ModelScope](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - Base) |
MiMo - 7B - RL - Zero | RL model trained from base model | [HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo - 7B - RL - Zero) | [ModelScope](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - RL - Zero) |
MiMo - 7B - SFT | SFT model trained from base model | [HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo - 7B - SFT) | [ModelScope](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - SFT) |
MiMo - 7B - RL | RL model trained from SFT model, superior performance matching OpenAI o1 - mini | [HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo - 7B - RL) | [ModelScope](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - RL) |
MiMo - 7B - RL - 0530 | Advanced RL model with extended length | [HuggingFace](https://huggingface.co/XiaomiMiMo/MiMo - 7B - RL - 0530) | [ModelScope](https://www.modelscope.cn/models/XiaomiMiMo/MiMo - 7B - RL - 0530) |
Evaluation Results
Benchmark | GPT - 4o - 0513 | Claude - 3.5 - Sonnet - 1022 | OpenAI o1 - mini | QwQ - 32B - Preview | R1 - Distill - Qwen - 14B | R1 - Distill - Qwen - 7B | MiMo - 7B - RL |
---|---|---|---|---|---|---|---|
General | |||||||
GPQA Diamond (Pass@1) |
49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
SuperGPQA (Pass@1) |
42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
DROP (3 - shot F1) |
83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
MMLU - Pro (EM) |
72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
IF - Eval (Prompt Strict) |
84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
Mathematics | |||||||
MATH - 500 (Pass@1) |
74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
AIME 2024 (Pass@1) |
9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
AIME 2025 (Pass@1) |
11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
Code | |||||||
LiveCodeBench v5 (Pass@1) |
32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
LiveCodeBench v6 (Pass@1) |
30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
MiMo - 7B series
Benchmark | MiMo - 7B - Base | MiMo - 7B - RL - Zero | MiMo - 7B - SFT | MiMo - 7B - RL | MiMo - 7B - RL - 0530 |
---|---|---|---|---|---|
Mathematics | |||||
MATH500 (Pass@1) |
37.4 | 93.6 | 93.0 | 95.8 | 97.2 |
AIME 2024 (Pass@1) |
32.9 | 56.4 | 58.7 | 68.2 | 80.1 |
AIME 2025 (Pass@1) |
24.3 | 46.3 | 44.3 | 55.4 | 70.2 |
Code | |||||
LiveCodeBench v5 (Pass@1) |
32.9 | 49.1 | 52.3 | 57.8 | 60.9 |
LiveCodeBench v6 (Pass@1) |
29.1 | 42.9 | 45.5 | 49.3 | 52.2 |
â ī¸ Important Note
The evaluations are conducted with
temperature = 0.6
.AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801 - 20250201), LiveCodeBench v6 (20250201 - 20250501), GPQA - Diamond and IF - Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.
Deployment
SGLang Inference
Thanks to the contribution from the SGLang team, we supported MiMo in SGLang mainstream within 24h with MTP coming soon. Detailed usage can be found in SGLang documents. MTP will also be supported in 24h.
vLLM inference
- [Recommended] We officially support inference with MiMo - MTP using our fork of vLLM.
- Or, you can register a vLLM loader for MiMo without loading MTP parameters.
HuggingFace inference
You can use the provided example script to perform inference.
Recommended environment and prompts
- We recommend using our fork of vLLM which is developed based on vLLM 0.7.3.
- We recommend using an empty system prompt.
đĄ Usage Tip
We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo.
đ§ Technical Details
Introduction
Currently, most successful RL works, including open - source research, rely on relatively large base models, e.g., 32B models, especially for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging.
Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post - training but also on pre - training strategies tailored to reasoning.
In this work, we present MiMo - 7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo - 7B - Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold - started SFT model, resulting in MiMo - 7B - RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1 - mini.
Model Architecture
The MTP layers of MiMo - 7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
RL Infrastructure
We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation. We also support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
đ License
This project is licensed under the MIT license.

