Phi-4-reasoning-plus開源模型 - 免費助力數學、科學與編碼高級推理

首頁

Phi 4 Reasoning Plus

由microsoft開發

Phi-4-reasoning-plus 是微軟研究院開發的先進開放權重推理模型，基於 Phi-4 通過監督微調和強化學習優化，專注於數學、科學和編碼領域的高級推理能力。

大型語言模型

Transformers

支持多種語言開源協議:MIT #數學推理優化 #長上下文處理 #強化學習微調

下載量 19.83k

發布時間 : 4/17/2025

模型概述

Phi-4-reasoning-plus 是一個高效的推理模型，通過思維鏈追蹤數據集和強化學習進行優化，特別擅長數學推理、科學問題和代碼生成任務。

模型特點

高級推理能力

通過監督微調和強化學習優化，特別擅長數學推理、科學問題和代碼生成任務。

長上下文處理

支持 32k 令牌的上下文長度，在處理長序列時保持連貫性和邏輯一致性。

結構化響應

響應分為思維鏈推理塊和總結塊，提供詳細的推理過程和精確的解決方案。

高效訓練

使用 160 億令牌的訓練數據，在 32 H100-80G GPU 上訓練 2.5 天完成。

模型能力

數學推理

科學問題解答

代碼生成

聊天對話

邏輯推理

使用案例

教育

數學問題解答

解決複雜的數學問題，如奧林匹克數學題。

在 AIME 2025 上達到 78.0 分

科學問題解答

回答研究生水平的科學問題。

在 GPQA-Diamond 上達到 68.9 分

編程

代碼生成

生成功能性代碼，解決編程競賽問題。

在 LiveCodeBench 上達到 53.1 分

🚀 Phi-4-reasoning-plus模型卡片

Phi-4-reasoning-plus是基於Phi-4微調的推理模型，在數學、科學和編碼等推理密集型任務中表現出色，具有較高的準確性和推理能力。

🚀 快速開始

若要充分發揮模型的能力，推理時必須使用 temperature=0.8、top_k=50、top_p=0.95 並設置 do_sample=True。對於更復雜的查詢，可設置 max_new_tokens=32768 以支持更長的思維鏈（CoT）。

✨ 主要特性

先進架構：基於14B參數的密集解碼器Transformer模型，與之前發佈的Phi-4基礎模型相同。
高質量訓練：使用16B標記（約83億唯一標記）的訓練數據，訓練時間為2.5天。
長上下文處理：支持32k標記的上下文長度，在實驗中擴展到64k標記也能保持良好性能。
高準確性：經過強化學習訓練，準確性更高，但平均生成的標記數增加50%，延遲也相應增加。
多任務表現出色：在推理任務和通用基準測試中均有良好表現，超越了許多大參數模型。

📦 安裝指南

文檔未提及安裝步驟，暫無法提供。

💻 使用示例

基礎用法

由於訓練數據的特性，推理時始終使用ChatML模板和以下系統提示：

<|im_start|>system<|im_sep|>
You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:<|im_end|>
<|im_start|>user<|im_sep|>
What is the derivative of x^2?<|im_end|>
<|im_start|>assistant<|im_sep|>

高級用法

使用`transformers`庫

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning-plus")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning-plus", device_map="auto", torch_dtype="auto")

messages = [
    {"role": "system", "content": "You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:"},
    {"role": "user", "content": "What is the derivative of x^2?"},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")

outputs = model.generate(
    inputs.to(model.device),
    max_new_tokens=4096,
    temperature=0.8,
    top_k=50,
    top_p=0.95,
    do_sample=True,
)
print(tokenizer.decode(outputs[0]))

使用`vllm`庫

vllm serve microsoft/Phi-4-reasoning-plus --enable-reasoning --reasoning-parser deepseek_r1

Phi-4-reasoning-plus還支持Ollama、llama.cpp和任何與Phi-4兼容的框架。

📚 詳細文檔

模型概述

屬性	詳情
開發者	Microsoft Research
描述	Phi-4-reasoning-plus是一個最先進的開放權重推理模型，基於Phi-4進行監督微調，並結合強化學習。監督微調數據集包括合成提示和來自公共領域網站的高質量過濾數據，專注於數學、科學和編碼技能以及安全和負責任AI的對齊數據。該模型經過強化學習訓練，準確性更高，但平均生成的標記數增加50%，延遲也相應增加。
架構	基礎模型與之前發佈的Phi-4相同，14B參數，密集解碼器Transformer模型
輸入	文本，最適合聊天格式的提示
上下文長度	32k標記
GPU	32個H100-80G
訓練時間	2.5天
訓練數據	16B標記，約83億唯一標記
輸出	對輸入的生成文本，模型響應分為推理思維鏈塊和總結塊兩部分
日期	2025年1月 - 2025年4月
狀態	基於離線數據集訓練的靜態模型，公開可用數據截止到2025年3月及更早
發佈日期	2025年4月30日
許可證	MIT

預期用途

用途類型	詳情
主要用例	該模型旨在加速語言模型的研究，作為生成AI功能的構建塊。適用於需要內存/計算受限環境、低延遲場景和推理邏輯的通用AI系統和應用（主要為英文）。
非預期用例	該模型僅針對數學推理進行設計和測試，並非針對所有下游用途進行專門設計或評估。開發者在選擇用例時應考慮語言模型的常見限制，並在特定下游用例中使用前評估和緩解準確性、安全性和公平性問題，特別是在高風險場景中。開發者應遵守適用的法律法規（包括隱私、貿易合規等），並參考“負責任AI考慮因素”部分獲取更多指導。

用途類型

詳情

主要用例

該模型旨在加速語言模型的研究，作為生成AI功能的構建塊。適用於需要內存/計算受限環境、低延遲場景和推理邏輯的通用AI系統和應用（主要為英文）。

非預期用例

該模型僅針對數學推理進行設計和測試，並非針對所有下游用途進行專門設計或評估。開發者在選擇用例時應考慮語言模型的常見限制，並在特定下游用例中使用前評估和緩解準確性、安全性和公平性問題，特別是在高風險場景中。開發者應遵守適用的法律法規（包括隱私、貿易合規等），並參考“負責任AI考慮因素”部分獲取更多指導。

數據概述

訓練數據集

訓練數據是數學、科學和編碼領域的問答和聊天格式數據的混合。聊天提示來自過濾後的高質量網絡數據，並可通過合成數據生成管道進行重寫和處理。此外，還包括提高真實性和安全性的數據。

基準數據集

使用開源的Eureka評估套件和內部基準對Phi-4-reasoning-plus進行評估，具體包括：

推理任務：AIME 2025、2024、2023和2022、GPQA-Diamond、OmniMath、LiveCodeBench、3SAT和TSP、BA Calendar、Maze和SpatialMap。
通用基準：Kitab、IFEval和ArenaHard、PhiBench、FlenQA、HumanEvalPlus、MMLU-Pro。