MiMo-7B-SFT Open-Source Model - Boosting Mathematical and Code Reasoning, with Performance Comparable to OpenAI o1-mini

Mimo 7B SFT

Developed by XiaomiMiMo

MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, achieving performance comparable to OpenAI o1-mini in mathematical and code reasoning tasks.

Large Language Model

Transformers

Open Source License:MIT #Mathematical Reasoning Optimization #Code Generation Enhancement #Multi-token Prediction

Downloads 1,183

Release Time : 4/29/2025

Model Overview

A 7B-parameter language model optimized for reasoning tasks, significantly enhancing mathematical and code reasoning capabilities through reinforcement learning training.

Model Features

Reinforcement Learning Optimization

Significantly improves mathematical and code reasoning capabilities through a carefully designed RL training process.

Multi-token Prediction

Uses MTP technology as an auxiliary training objective, improving both performance and inference speed.

Efficient Inference

The optimized model maintains high performance while achieving faster inference speeds.

Model Capabilities

Mathematical problem solving

Code generation and completion

Logical reasoning

Text understanding and generation

Complex problem solving

Use Cases

Education

Math Problem Solving

Helps students solve various math problems, including advanced math competition questions.

Achieves 68.2% accuracy on AIME math competition problems.

Programming Assistance

Code Generation

Generates executable code based on natural language descriptions.

Achieves 57.8% accuracy on LiveCodeBench tests.

🚀 MiMo-7B: Unlocking the Reasoning Potential of Language Model

MiMo-7B is a series of models trained from scratch, born for reasoning tasks. It unlocks the reasoning potential of language models through both pre - training and post - training strategies, showing excellent performance in mathematics and code reasoning tasks.

🚀 Quick Start

To quickly start using the MiMo - 7B series models, you can choose different inference methods according to your needs. The following are examples of different inference methods:

SGLang Inference

# Install the latest SGlang from main branch
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"

# Launch SGLang Server
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-SFT --host 0.0.0.0 --trust-remote-code

Detailed usage can be found in SGLang documents. MTP will also be supported in 24h.

vLLM inference

Recommended method

from vllm import LLM, SamplingParams

model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

conversation = [
    {
        "role": "system",
        "content": ""
    },
    {
        "role": "user",
        "content": "Write an essay about the importance of higher education.",
    },
]

outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   use_tqdm=False)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

print("=" * 80)

Alternative method

import register_mimo_in_vllm

from vllm import LLM, SamplingParams

model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    # num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

HuggingFace inference

from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer

model_id = "XiaomiMiMo/MiMo-7B-SFT"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(**inputs)

✨ Features

Pre - Training: Base Model Born for Reasoning

Optimize the data pre - processing pipeline, enhance text extraction toolkits, and apply multi - dimensional data filtering to increase the reasoning pattern density in pre - training data. Also, use multiple strategies to generate massive diverse synthetic reasoning data.
Adopt a three - stage data mixture strategy for pre - training. MiMo - 7B - Base is pre - trained on approximately 25 trillion tokens.
Incorporate Multiple - Token Prediction as an additional training objective to enhance model performance and accelerate inference.

Post - Training Recipe: Pioneering Reasoning Model

Curate 130K mathematics and code problems as RL training data, which can be verified by rule - based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. Only rule - based accuracy rewards are employed to avoid potential reward hacking.
Introduce a test difficulty driven code reward to mitigate the sparse reward issue for challenging code problems. By assigning fine - grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
Implement a data re - sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, especially in the later phases of RL training.

RL Infrastructure

Develop a Seamless Rollout Engine to accelerate RL training and validation. The design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
Support MTP in vLLM and enhance the robustness of the inference engine in the RL system.

📦 Installation

The installation steps are included in the inference examples above. For SGLang, you need to install it from the main branch; for vLLM, you can use the official fork or register a loader; for HuggingFace, you need to install the transformers library.

💻 Usage Examples

The usage examples are provided in the "Quick Start" section, including code examples for SGLang, vLLM, and HuggingFace inference.

📚 Documentation

The detailed usage of SGLang can be found in SGLang documents.
The technical report can be found at https://arxiv.org/abs/2505.07608.
The models are available on HuggingFace and ModelScope.

🔧 Technical Details

The MTP layers of MiMo - 7B are tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.
The RL experiments from MiMo - 7B - Base show that the model has extraordinary reasoning potential, even surpassing much larger 32B models. The MiMo - 7B - RL model, obtained by performing RL training on a cold - started SFT model, demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1 - mini.

📄 License

This model repository is licensed under the MIT License.

Model Information

Property	Details
Model Type	MiMo - 7B series (MiMo - 7B - Base, MiMo - 7B - RL - Zero, MiMo - 7B - SFT, MiMo - 7B - RL)
Training Data	Pre - training: approximately 25 trillion tokens; Post - training: 130K mathematics and code problems

Evaluation Results

General Benchmarks

Benchmark	GPT - 4o - 0513	Claude - 3.5 - Sonnet - 1022	OpenAI o1 - mini	QwQ - 32B - Preview	R1 - Distill - Qwen - 14B	R1 - Distill - Qwen - 7B	MiMo - 7B - RL
GPQA Diamond (Pass@1)	49.9	65.0	60.0	54.5	59.1	49.1	54.4
SuperGPQA (Pass@1)	42.4	48.2	45.2	43.6	40.6	28.9	40.5
DROP (3 - shot F1)	83.7	88.3	83.9	71.2	85.5	77.0	78.7
MMLU - Pro (EM)	72.6	78.0	80.3	52.0	68.8	53.5	58.6
IF - Eval (Prompt Strict)	84.3	86.5	84.8	40.4	78.3	60.5	61.0

Mathematics Benchmarks

Benchmark	GPT - 4o - 0513	Claude - 3.5 - Sonnet - 1022	OpenAI o1 - mini	QwQ - 32B - Preview	R1 - Distill - Qwen - 14B	R1 - Distill - Qwen - 7B	MiMo - 7B - RL
MATH - 500 (Pass@1)	74.6	78.3	90.0	90.6	93.9	92.8	95.8
AIME 2024 (Pass@1)	9.3	16.0	63.6	50.0	69.7	55.5	68.2
AIME 2025 (Pass@1)	11.6	7.4	50.7	32.4	48.2	38.8	55.4

Code Benchmarks

Benchmark	GPT - 4o - 0513	Claude - 3.5 - Sonnet - 1022	OpenAI o1 - mini	QwQ - 32B - Preview	R1 - Distill - Qwen - 14B	R1 - Distill - Qwen - 7B	MiMo - 7B - RL
LiveCodeBench v5 (Pass@1)	32.9	38.9	53.8	41.9	53.1	37.6	57.8
LiveCodeBench v6 (Pass@1)	30.9	37.2	46.8	39.1	31.9	23.9	49.3

MiMo - 7B Series Benchmarks

Benchmark	MiMo - 7B - Base	MiMo - 7B - RL - Zero	MiMo - 7B - SFT	MiMo - 7B - RL
MATH500 (Pass@1)	37.4	93.6	93.0	95.8
AIME 2024 (Pass@1)	32.9	56.4	58.7	68.2
AIME 2025 (Pass@1)	24.3	46.3	44.3	55.4
LiveCodeBench v5 (Pass@1)	32.9	49.1	52.3	57.8
LiveCodeBench v6 (Pass@1)	29.1	42.9	45.5	49.3

⚠️ Important Note

The evaluations are conducted with temperature = 0.6.

AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801 - 20250201), LiveCodeBench v6 (20250201 - 20250501), GPQA - Diamond and IF - Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご