Phi-4-reasoning-plus开源模型 - 免费助力数学、科学与编码高级推理

首页

Phi 4 Reasoning Plus

由 microsoft 开发

Phi-4-reasoning-plus 是微软研究院开发的先进开放权重推理模型，基于 Phi-4 通过监督微调和强化学习优化，专注于数学、科学和编码领域的高级推理能力。

大型语言模型

Transformers

支持多种语言开源协议:MIT #数学推理优化 #长上下文处理 #强化学习微调

下载量 19.83k

发布时间 : 4/17/2025

模型简介

Phi-4-reasoning-plus 是一个高效的推理模型，通过思维链追踪数据集和强化学习进行优化，特别擅长数学推理、科学问题和代码生成任务。

模型特点

高级推理能力

通过监督微调和强化学习优化，特别擅长数学推理、科学问题和代码生成任务。

长上下文处理

支持 32k 令牌的上下文长度，在处理长序列时保持连贯性和逻辑一致性。

结构化响应

响应分为思维链推理块和总结块，提供详细的推理过程和精确的解决方案。

高效训练

使用 160 亿令牌的训练数据，在 32 H100-80G GPU 上训练 2.5 天完成。

模型能力

数学推理

科学问题解答

代码生成

聊天对话

逻辑推理

使用案例

教育

数学问题解答

解决复杂的数学问题，如奥林匹克数学题。

在 AIME 2025 上达到 78.0 分

科学问题解答

回答研究生水平的科学问题。

在 GPQA-Diamond 上达到 68.9 分

编程

代码生成

生成功能性代码，解决编程竞赛问题。

在 LiveCodeBench 上达到 53.1 分

🚀 Phi-4-reasoning-plus模型卡片

Phi-4-reasoning-plus是基于Phi-4微调的推理模型，在数学、科学和编码等推理密集型任务中表现出色，具有较高的准确性和推理能力。

🚀 快速开始

若要充分发挥模型的能力，推理时必须使用 temperature=0.8、top_k=50、top_p=0.95 并设置 do_sample=True。对于更复杂的查询，可设置 max_new_tokens=32768 以支持更长的思维链（CoT）。

✨ 主要特性

先进架构：基于14B参数的密集解码器Transformer模型，与之前发布的Phi-4基础模型相同。
高质量训练：使用16B标记（约83亿唯一标记）的训练数据，训练时间为2.5天。
长上下文处理：支持32k标记的上下文长度，在实验中扩展到64k标记也能保持良好性能。
高准确性：经过强化学习训练，准确性更高，但平均生成的标记数增加50%，延迟也相应增加。
多任务表现出色：在推理任务和通用基准测试中均有良好表现，超越了许多大参数模型。

📦 安装指南

文档未提及安装步骤，暂无法提供。

💻 使用示例

基础用法

由于训练数据的特性，推理时始终使用ChatML模板和以下系统提示：

<|im_start|>system<|im_sep|>
You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:<|im_end|>
<|im_start|>user<|im_sep|>
What is the derivative of x^2?<|im_end|>
<|im_start|>assistant<|im_sep|>

高级用法

使用`transformers`库

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning-plus")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning-plus", device_map="auto", torch_dtype="auto")

messages = [
    {"role": "system", "content": "You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:"},
    {"role": "user", "content": "What is the derivative of x^2?"},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")

outputs = model.generate(
    inputs.to(model.device),
    max_new_tokens=4096,
    temperature=0.8,
    top_k=50,
    top_p=0.95,
    do_sample=True,
)
print(tokenizer.decode(outputs[0]))

使用`vllm`库

vllm serve microsoft/Phi-4-reasoning-plus --enable-reasoning --reasoning-parser deepseek_r1

Phi-4-reasoning-plus还支持Ollama、llama.cpp和任何与Phi-4兼容的框架。

📚 详细文档

模型概述

属性	详情
开发者	Microsoft Research
描述	Phi-4-reasoning-plus是一个最先进的开放权重推理模型，基于Phi-4进行监督微调，并结合强化学习。监督微调数据集包括合成提示和来自公共领域网站的高质量过滤数据，专注于数学、科学和编码技能以及安全和负责任AI的对齐数据。该模型经过强化学习训练，准确性更高，但平均生成的标记数增加50%，延迟也相应增加。
架构	基础模型与之前发布的Phi-4相同，14B参数，密集解码器Transformer模型
输入	文本，最适合聊天格式的提示
上下文长度	32k标记
GPU	32个H100-80G
训练时间	2.5天
训练数据	16B标记，约83亿唯一标记
输出	对输入的生成文本，模型响应分为推理思维链块和总结块两部分
日期	2025年1月 - 2025年4月
状态	基于离线数据集训练的静态模型，公开可用数据截止到2025年3月及更早
发布日期	2025年4月30日
许可证	MIT

预期用途

用途类型	详情
主要用例	该模型旨在加速语言模型的研究，作为生成AI功能的构建块。适用于需要内存/计算受限环境、低延迟场景和推理逻辑的通用AI系统和应用（主要为英文）。
非预期用例	该模型仅针对数学推理进行设计和测试，并非针对所有下游用途进行专门设计或评估。开发者在选择用例时应考虑语言模型的常见限制，并在特定下游用例中使用前评估和缓解准确性、安全性和公平性问题，特别是在高风险场景中。开发者应遵守适用的法律法规（包括隐私、贸易合规等），并参考“负责任AI考虑因素”部分获取更多指导。

用途类型

详情

主要用例

该模型旨在加速语言模型的研究，作为生成AI功能的构建块。适用于需要内存/计算受限环境、低延迟场景和推理逻辑的通用AI系统和应用（主要为英文）。

非预期用例

该模型仅针对数学推理进行设计和测试，并非针对所有下游用途进行专门设计或评估。开发者在选择用例时应考虑语言模型的常见限制，并在特定下游用例中使用前评估和缓解准确性、安全性和公平性问题，特别是在高风险场景中。开发者应遵守适用的法律法规（包括隐私、贸易合规等），并参考“负责任AI考虑因素”部分获取更多指导。

数据概述

训练数据集

训练数据是数学、科学和编码领域的问答和聊天格式数据的混合。聊天提示来自过滤后的高质量网络数据，并可通过合成数据生成管道进行重写和处理。此外，还包括提高真实性和安全性的数据。

基准数据集

使用开源的Eureka评估套件和内部基准对Phi-4-reasoning-plus进行评估，具体包括：

推理任务：AIME 2025、2024、2023和2022、GPQA-Diamond、OmniMath、LiveCodeBench、3SAT和TSP、BA Calendar、Maze和SpatialMap。
通用基准：Kitab、IFEval和ArenaHard、PhiBench、FlenQA、HumanEvalPlus、MMLU-Pro。