ReasonFlux-F1-32B開源大語言模型 - 增強推理微調，推理任務表現超出色！

首頁

Reasonflux F1

由Gen-Verse開發

ReasonFlux-F1-32B是基於思維模板擴展的分層大語言模型，通過模板增強推理軌跡微調，在推理任務中表現優異。

大型語言模型

Transformers

開源協議:其他 #數學推理增強 #思維模板推理 #競賽題求解

下載量 123

發布時間 : 3/21/2025

模型概述

ReasonFlux-F1-32B是一個革命性的模板增強推理範式大語言模型，基於deepseek-ai/DeepSeek-R1-Distill-Qwen-32B微調，專注於複雜推理任務。

模型特點

模板增強推理

採用革命性的模板增強推理範式，顯著提升複雜推理任務表現

分層推理能力

通過分層推理架構處理複雜問題，逐步分解和解決問題

高性能推理

在多項推理基準測試中超越同類32B模型

模型能力

複雜數學問題求解

邏輯推理

多步問題解答

長文本理解

使用案例

數學競賽

AIME數學競賽題解答

解決美國數學邀請賽(AIME)的複雜數學問題

在AIME2024上達到76.7%的Pass@1準確率

學術研究

GPQA鑽石級問題解答

解決GPQA鑽石級難度的問題

在GPQA-Diamond上達到67.2%的Pass@1準確率

🚀 ReasonFlux：通過擴展思維模板實現分層大語言模型推理

ReasonFlux 是一種革命性的模板增強推理範式，它使一個 32B 的模型在推理任務中超越了 o1 - mini 和 DeepSeek - R1 蒸餾模型。

📦 模型信息

屬性	詳情
庫名稱	transformers
許可證	other
基礎模型	deepseek - ai/DeepSeek - R1 - Distill - Qwen - 32B
標籤	llama - factory、full、generated_from_trainer
模型名稱	ReasonFlux - F1 - 32B

📊 推理任務表現對比

任務/Pass@1	ReasonFlux - F1 - 32B	ReasonFlux - Zero - 32B	R1 - Distill - 32B	o1 - mini	LIMO - 32B	s1 - 32B
MATH500	96.0	91.2	94.3	90.0	90.6	93.0
AIME 2024	76.7	56.7	72.6	56.7	50.0	56.7
AIME 2025	53.3	37.2	46.67	50.8	37.2	49.3
GPQA - Diamond	67.2	61.2	62.1	60.0	65.2	59.6

📝 ReasonFlux - F1 - 32B 介紹

ReasonFlux - F1 - 32B 是我們通過利用來自 ReasonFlux - Zero 的模板增強推理軌跡微調得到的 SOTA 級推理大語言模型。

Github 倉庫：[Gen - Verse/ReasonFlux](https://github.com/Gen - Verse/ReasonFlux)
論文：ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
數據集：[Gen - Verse/ReasonFlux - F1 - SFT](https://huggingface.co/datasets/Gen - Verse/ReasonFlux - F1 - SFT)

📈 評估結果

我們展示了 ReasonFlux - F1 - 32B 在包括 AIME2024、AIM2025、MATH500 和 GPQA - Diamond 等具有挑戰性的推理任務上的評估結果。為了進行公平比較，我們報告了這些大語言模型在 [ReasonFlux - F1](https://github.com/Gen - Verse/ReasonFlux/tree/main/reasonflux - f1) 評估腳本上的結果。

模型	AIME2024@pass1	AIME2025@pass1	MATH500@pass1	GPQA@pass1
QwQ - 32B - Preview	46.7	37.2	90.6	65.2
LIMO - 32B	56.3	44.5	94.8	58.1
s1 - 32B	56.7	49.3	93.0	59.6
OpenThinker - 32B	66.0	53.3	94.8	60.1
R1 - Distill - 32B	70.0	46.7	92.0	59.6
ReasonFlux - Zero - 32B	56.7	37.2	91.2	61.2
ReasonFlux - F1 - 32B	76.7	53.3	96.0	67.2

💻 使用示例

基礎用法

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = 'Gen-Verse/ReasonFlux-F1'

model = LLM(
    model_id,
    tensor_parallel_size=8,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

sampling_params = SamplingParams(
    max_tokens=32768,
)
# 2022 AIME I Problems/Problem 15
question = """Let \(x, y\), and \(z\) be positive real numbers satisfying the system of equations:
\[
\begin{array}{c}
\sqrt{2 x-x y}+\sqrt{2 y-x y}=1 \\
\sqrt{2 y-y z}+\sqrt{2 z-y z}=\sqrt{2} \\
\sqrt{2 z-z x}+\sqrt{2 x-z x}=\sqrt{3} .
\end{array}
\]
Then \(\left[(1-x)(1-y)(1-z)\right]^{2}\) can be written as \(\frac{m}{n}\), where \(m\) and \(n\) are relatively prime positive integers. Find \(m+n\)."""
ds_prompt="<｜User｜>\n" + question + "<｜Assistant｜>\n"
output = model.generate(ds_prompt, sampling_params=sampling_params)
print(output[0].outputs[0].text)

📖 引用信息

@article{yang2025reasonflux,
  title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
  author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
  journal={arXiv preprint arXiv:2502.06772},
  year={2025}
}