Model Overview
Model Features
Model Capabilities
Use Cases
đ MiMo-7B: Unlocking the Reasoning Potential of Language Model
MiMo-7B is a series of models trained from scratch, born for reasoning tasks. It unlocks the reasoning potential of language models through both pre - training and post - training strategies, showing excellent performance in mathematics and code reasoning tasks.
đ Quick Start
To quickly start using the MiMo - 7B series models, you can choose different inference methods according to your needs. The following are examples of different inference methods:
SGLang Inference
# Install the latest SGlang from main branch
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"
# Launch SGLang Server
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-SFT --host 0.0.0.0 --trust-remote-code
Detailed usage can be found in SGLang documents. MTP will also be supported in 24h.
vLLM inference
Recommended method
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model=model_path,
trust_remote_code=True,
num_speculative_tokens=1,
disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)
conversation = [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": "Write an essay about the importance of higher education.",
},
]
outputs = llm.chat(conversation,
sampling_params=sampling_params,
use_tqdm=False)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
print("=" * 80)
Alternative method
import register_mimo_in_vllm
from vllm import LLM, SamplingParams
model_path = "/path/to/MiMo"
llm = LLM(
model=model_path,
trust_remote_code=True,
# num_speculative_tokens=1,
disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)
HuggingFace inference
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
model_id = "XiaomiMiMo/MiMo-7B-SFT"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(**inputs)
⨠Features
Pre - Training: Base Model Born for Reasoning
- Optimize the data pre - processing pipeline, enhance text extraction toolkits, and apply multi - dimensional data filtering to increase the reasoning pattern density in pre - training data. Also, use multiple strategies to generate massive diverse synthetic reasoning data.
- Adopt a three - stage data mixture strategy for pre - training. MiMo - 7B - Base is pre - trained on approximately 25 trillion tokens.
- Incorporate Multiple - Token Prediction as an additional training objective to enhance model performance and accelerate inference.
Post - Training Recipe: Pioneering Reasoning Model
- Curate 130K mathematics and code problems as RL training data, which can be verified by rule - based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. Only rule - based accuracy rewards are employed to avoid potential reward hacking.
- Introduce a test difficulty driven code reward to mitigate the sparse reward issue for challenging code problems. By assigning fine - grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
- Implement a data re - sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, especially in the later phases of RL training.
RL Infrastructure
- Develop a Seamless Rollout Engine to accelerate RL training and validation. The design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
- Support MTP in vLLM and enhance the robustness of the inference engine in the RL system.
đĻ Installation
The installation steps are included in the inference examples above. For SGLang, you need to install it from the main branch; for vLLM, you can use the official fork or register a loader; for HuggingFace, you need to install the transformers
library.
đģ Usage Examples
The usage examples are provided in the "Quick Start" section, including code examples for SGLang, vLLM, and HuggingFace inference.
đ Documentation
- The detailed usage of SGLang can be found in SGLang documents.
- The technical report can be found at https://arxiv.org/abs/2505.07608.
- The models are available on HuggingFace and ModelScope.
đ§ Technical Details
- The MTP layers of MiMo - 7B are tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
- The RL experiments from MiMo - 7B - Base show that the model has extraordinary reasoning potential, even surpassing much larger 32B models. The MiMo - 7B - RL model, obtained by performing RL training on a cold - started SFT model, demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1 - mini.
đ License
This model repository is licensed under the MIT License.
Model Information
Property | Details |
---|---|
Model Type | MiMo - 7B series (MiMo - 7B - Base, MiMo - 7B - RL - Zero, MiMo - 7B - SFT, MiMo - 7B - RL) |
Training Data | Pre - training: approximately 25 trillion tokens; Post - training: 130K mathematics and code problems |
Evaluation Results
General Benchmarks
Benchmark | GPT - 4o - 0513 | Claude - 3.5 - Sonnet - 1022 | OpenAI o1 - mini | QwQ - 32B - Preview | R1 - Distill - Qwen - 14B | R1 - Distill - Qwen - 7B | MiMo - 7B - RL |
---|---|---|---|---|---|---|---|
GPQA Diamond (Pass@1) | 49.9 | 65.0 | 60.0 | 54.5 | 59.1 | 49.1 | 54.4 |
SuperGPQA (Pass@1) | 42.4 | 48.2 | 45.2 | 43.6 | 40.6 | 28.9 | 40.5 |
DROP (3 - shot F1) | 83.7 | 88.3 | 83.9 | 71.2 | 85.5 | 77.0 | 78.7 |
MMLU - Pro (EM) | 72.6 | 78.0 | 80.3 | 52.0 | 68.8 | 53.5 | 58.6 |
IF - Eval (Prompt Strict) | 84.3 | 86.5 | 84.8 | 40.4 | 78.3 | 60.5 | 61.0 |
Mathematics Benchmarks
Benchmark | GPT - 4o - 0513 | Claude - 3.5 - Sonnet - 1022 | OpenAI o1 - mini | QwQ - 32B - Preview | R1 - Distill - Qwen - 14B | R1 - Distill - Qwen - 7B | MiMo - 7B - RL |
---|---|---|---|---|---|---|---|
MATH - 500 (Pass@1) | 74.6 | 78.3 | 90.0 | 90.6 | 93.9 | 92.8 | 95.8 |
AIME 2024 (Pass@1) | 9.3 | 16.0 | 63.6 | 50.0 | 69.7 | 55.5 | 68.2 |
AIME 2025 (Pass@1) | 11.6 | 7.4 | 50.7 | 32.4 | 48.2 | 38.8 | 55.4 |
Code Benchmarks
Benchmark | GPT - 4o - 0513 | Claude - 3.5 - Sonnet - 1022 | OpenAI o1 - mini | QwQ - 32B - Preview | R1 - Distill - Qwen - 14B | R1 - Distill - Qwen - 7B | MiMo - 7B - RL |
---|---|---|---|---|---|---|---|
LiveCodeBench v5 (Pass@1) | 32.9 | 38.9 | 53.8 | 41.9 | 53.1 | 37.6 | 57.8 |
LiveCodeBench v6 (Pass@1) | 30.9 | 37.2 | 46.8 | 39.1 | 31.9 | 23.9 | 49.3 |
MiMo - 7B Series Benchmarks
Benchmark | MiMo - 7B - Base | MiMo - 7B - RL - Zero | MiMo - 7B - SFT | MiMo - 7B - RL |
---|---|---|---|---|
MATH500 (Pass@1) | 37.4 | 93.6 | 93.0 | 95.8 |
AIME 2024 (Pass@1) | 32.9 | 56.4 | 58.7 | 68.2 |
AIME 2025 (Pass@1) | 24.3 | 46.3 | 44.3 | 55.4 |
LiveCodeBench v5 (Pass@1) | 32.9 | 49.1 | 52.3 | 57.8 |
LiveCodeBench v6 (Pass@1) | 29.1 | 42.9 | 45.5 | 49.3 |
â ī¸ Important Note
The evaluations are conducted with
temperature = 0.6
.AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801 - 20250201), LiveCodeBench v6 (20250201 - 20250501), GPQA - Diamond and IF - Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.

