Phi-4-mini-reasoning Open Source Model - Supports High-Quality Reasoning and Enhances Mathematical Problem-Solving Ability

Phi 4 Mini Reasoning

Developed by microsoft

Phi-4-mini-reasoning is a lightweight open-source model that focuses on high-quality, dense reasoning data and is further fine-tuned to obtain more advanced mathematical reasoning capabilities.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Mathematical reasoning #Lightweight model #128K long text

Downloads 18.93k

Release Time : 4/29/2025

Model Overview

This model is built based on synthetic data, supports a context length of 128K, and is specifically designed for multi-step, logic-intensive mathematical problem-solving tasks in memory/computation-constrained environments and latency-constrained scenarios.

Model Features

Lightweight design

Designed for memory/computation-constrained environments and latency-constrained scenarios, suitable for deployment on edge or mobile systems.

Advanced mathematical reasoning

Fine-tuned with synthetic data, excels at solving multi-step, logic-intensive mathematical problems.

Long context support

Supports a context length of 128K, suitable for reasoning tasks that require maintaining a long context.

Efficient reasoning

Performs excellently on various reasoning benchmarks and is competitive compared to larger models.

Model Capabilities

Mathematical reasoning

Formal proof generation

Symbolic computation

Advanced word problem solving

Use Cases

Education

Mathematics tutoring

Used for embedded tutoring in educational applications to help students solve complex mathematical problems.

Provides step-by-step problem solutions to help students understand the problem-solving process.

Edge computing

Edge device deployment

Provides high-quality, step-by-step problem-solving capabilities in computation or latency-constrained environments.

Enables efficient mathematical reasoning and problem-solving on edge devices.

🚀 Phi-4-mini-reasoning

Phi-4-mini-reasoning is a lightweight open model focused on high-quality, reasoning dense data, finetuned for advanced math reasoning capabilities.

🚀 Quick Start

Phi-4-mini-reasoning is a powerful model for mathematical reasoning. To get started, you can refer to the usage section below for details on tokenization, input formats, and inference.

✨ Features

Optimized for Math Reasoning: Designed for multi - step, logic - intensive mathematical problem - solving tasks, especially in memory/compute constrained environments and latency bound scenarios.
High - Quality Output: Capable of maintaining context across steps, applying structured logic, and delivering accurate solutions in mathematical reasoning domains.
Compact Size: Balances reasoning ability with efficiency, suitable for educational applications, embedded tutoring, and lightweight deployment on edge or mobile systems.

📦 Installation

Phi-4-mini-reasoning has been integrated in the 4.51.3 version of transformers. You can verify the current transformers version with: pip list | grep transformers. Python 3.8 and 3.10 work best. The required packages are as follows:

flash_attn==2.7.4.post1
torch==2.5.1
transformers==4.51.3
accelerate==1.3.0

💻 Usage Examples

Basic Usage

Tokenizer

Phi-4-mini-reasoning supports a vocabulary size of up to 200064 tokens. The tokenizer files already provide placeholder tokens for downstream fine - tuning, and can be extended up to the model's vocabulary size.

Input Formats

The Phi-4-mini-instruct model is best suited for prompts using specific formats. The two primary formats are:

Chat format

This format is used for general conversation and instructions:

<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>

Inference

After obtaining the Phi-4-mini-instruct model checkpoints, you can use the following sample code for inference:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)

model_id = "microsoft/Phi-4-mini-reasoning"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [{
    "role": "user",
    "content": "How to solve 3*x^2+4*x+5=1?"
}]   
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)

outputs = model.generate(
    **inputs.to(model.device),
    max_new_tokens=32768,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])

print(outputs[0])

📚 Documentation

Intended Uses

Primary Use Cases

Phi-4-mini-reasoning is designed for multi-step, logic-intensive mathematical problem-solving tasks under memory/compute constrained environments and latency bound scenarios. Use cases include formal proof generation, symbolic computation, advanced word problems, and various mathematical reasoning scenarios.

Use Case Considerations

This model is designed and tested for math reasoning only. Developers should consider common limitations of language models, performance differences across languages, and evaluate and mitigate for accuracy, safety, and fairness before using in specific downstream use cases, especially high - risk scenarios. They should also adhere to applicable laws or regulations.

Release Notes

This release of Phi-4-mini-reasoning addresses user feedback and market demand for a compact reasoning model. It is optimized for mathematical reasoning, fine - tuned with synthetic math data, and balances reasoning ability with efficiency.

Model Quality

The 3.8B parameters Phi-4-mini-reasoning model was compared with a set of models over a variety of reasoning benchmarks:

Model	AIME	MATH-500	GPQA Diamond
o1-mini*	63.6	90.0	60.0
DeepSeek-R1-Distill-Qwen-7B	53.3	91.4	49.5
DeepSeek-R1-Distill-Llama-8B	43.3	86.9	47.3
Bespoke-Stratos-7B*	20.0	82.0	37.8
OpenThinker-7B*	31.3	83.0	42.4
Llama-3.2-3B-Instruct	6.7	44.4	25.3
Phi-4-Mini (base model, 3.8B)	10.0	71.8	36.9
Phi-4-mini-reasoning (3.8B)	57.5	94.6	52.0

Overall, the 3.8B - param model achieves a similar level of multilingual language understanding and reasoning ability as much larger models, but has limitations due to its size.

Training

Model

Property	Details
Architecture	Shares the same architecture as Phi-4-Mini, a dense decoder-only Transformer model with 3.8B parameters. Major changes compared to Phi-3.5-Mini are 200K vocabulary, grouped-query attention, and shared input and output embedding.
Inputs	Text, best suited for prompts in chat format.
Context length	128K tokens
GPUs	128 H100 - 80G
Training time	2 days
Training data	150B tokens
Outputs	Generated text
Dates	Trained in February 2024
Status	A static model trained on offline datasets with a cutoff date of February 2025 for publicly available data.
Supported languages	English
Release date	April 2025

Training Datasets

The training data consists of synthetic mathematical content generated by Deepseek - R1. It includes over one million diverse math problems, and about 30 billion tokens of math content after verification. The dataset integrates three primary components: curated high - quality math questions, synthetic math data generated by Deepseek - R1, and preference data for enhancing reasoning capabilities.

Software

PyTorch
Transformers
[Flash - Attention](https://github.com/HazyResearch/flash - attention)

Hardware

The Phi-4-mini-reasoning model uses flash attention by default, which requires certain types of GPU hardware. It has been tested on NVIDIA A100 and NVIDIA H100. If you want to run the model on NVIDIA V100 or earlier generation GPUs, call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager".

Safety Evaluation and Red - Teaming

The Phi-4 family of models adopts a robust safety post - training approach using a variety of datasets. Phi-4-Mini-Reasoning was developed in accordance with Microsoft's responsible AI principles, and its safety risks were assessed using the Azure AI Foundry's Risk and Safety Evaluation framework.

Responsible AI Considerations

The Phi family of models has potential limitations such as unfairness, unreliability, or offensiveness. Developers should apply responsible AI best practices, including mapping, measuring, and mitigating risks according to their specific use cases and contexts.

License

The model is licensed under the MIT license.

Trademarks

This project may contain trademarks or logos. Authorized use of Microsoft trademarks or logos must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en - us/legal/intellectualproperty/trademarks). Use of third - party trademarks or logos is subject to their policies.

Appendix A: Benchmark Methodology

We aim to ensure an apples - to - apples comparison in benchmarks by using the same generation configuration. The model is evaluated with three popular math benchmarks: Math - 500, AIME 2024, and GPQA Diamond.

⚠️ Important Note

Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.

💡 Usage Tip

The model has an elevated defect rate when responding to election - critical queries. Users should verify election - related information with the election authority in their region. Also, for non - English languages, performance may be worse, and developers should test and customize the model as needed.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご