ReasonFlux-F1-7B开源大语言模型 - 强大推理能力助力多项推理任务完成

首页

Reasonflux F1 7B

由 Gen-Verse 开发

ReasonFlux-F1-7B是基于思维模板扩展的分层大语言模型推理模型，通过模板增强推理轨迹微调而成，在多项推理任务中表现优异。

大型语言模型

Transformers

开源协议:其他 #数学推理 #竞赛题解 #思维模板增强

下载量 291

发布时间 : 3/22/2025

模型简介

该模型采用革命性的模板增强推理范式，专注于提升大语言模型在复杂推理任务中的表现，尤其在数学和逻辑推理方面具有显著优势。

模型特点

模板增强推理

采用革命性的模板增强推理范式，显著提升模型在复杂推理任务中的表现

分层推理架构

基于思维模板扩展的分层推理架构，能够处理复杂的多步推理问题

高性能推理

在多项推理基准测试中超越同类模型，包括AIME、MATH500和GPQA-Diamond等

模型能力

数学问题求解

逻辑推理

复杂问题分析

多步推理

使用案例

数学竞赛

AIME竞赛题解答

解决美国数学邀请赛(AIME)中的复杂数学问题

在AIME2024测试中达到76.7%的通过率

学术研究

高级数学问题求解

解决研究生水平的数学问题

在MATH500测试中达到96.0%的通过率

逻辑推理

复杂逻辑问题分析

解决需要多步推理的逻辑问题

在GPQA-Diamond测试中达到67.2%的通过率

🚀 ReasonFlux：通过扩展思维模板实现分层大语言模型推理

ReasonFlux是一种革命性的模板增强推理范式，它使一个32B的模型在推理任务中超越了o1 - mini和DeepSeek - R1蒸馏模型。

任务/Pass@1	ReasonFlux - F1 - 32B	ReasonFlux - Zero - 32B	R1 - Distill - 32B	o1 - mini	LIMO - 32B	s1 - 32B
MATH500	96.0	91.2	94.3	90.0	90.6	93.0
AIME 2024	76.7	56.7	72.6	56.7	50.0	56.7
AIME 2025	53.3	37.2	46.67	50.8	37.2	49.3
GPQA - Diamond	67.2	61.2	62.1	60.0	65.2	59.6

🚀 快速开始

使用VLLM快速开始

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = 'Gen-Verse/ReasonFlux-F1-7B'

model = LLM(
    model_id,
    tensor_parallel_size=8,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

sampling_params = SamplingParams(
    max_tokens=32768,
)
# 2022 AIME I Problems/Problem 15
question = """Let \(x, y\), and \(z\) be positive real numbers satisfying the system of equations:
\[
\begin{array}{c}
\sqrt{2 x-x y}+\sqrt{2 y-x y}=1 \\
\sqrt{2 y-y z}+\sqrt{2 z-y z}=\sqrt{2} \\
\sqrt{2 z-z x}+\sqrt{2 x-z x}=\sqrt{3} .
\end{array}
\]
Then \(\left[(1-x)(1-y)(1-z)\right]^{2}\) can be written as \(\frac{m}{n}\), where \(m\) and \(n\) are relatively prime positive integers. Find \(m+n\)."""
ds_prompt="<｜User｜>\n" + question + "<｜Assistant｜>\n"
output = model.generate(ds_prompt, sampling_params=sampling_params)
print(output[0].outputs[0].text)

✨ 主要特性

ReasonFlux - F1 - 7B是我们通过利用来自ReasonFlux - Zero的模板增强推理轨迹进行微调的SOTA级推理大语言模型。

Github仓库：[Gen - Verse/ReasonFlux](https://github.com/Gen - Verse/ReasonFlux)
论文：ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
数据集：[Gen - Verse/ReasonFlux - F1 - SFT](https://huggingface.co/datasets/Gen - Verse/ReasonFlux - F1 - SFT)

📚 详细文档

评估结果

我们展示了ReasonFlux - F1 - 32B在包括AIME2024、AIM2025、MATH500和GPQA - Diamond等具有挑战性的推理任务上的评估结果。为了进行公平比较，我们报告了这些大语言模型在[ReasonFlux - F1](https://github.com/Gen - Verse/ReasonFlux)评估脚本上的结果。

模型	AIME2024@pass1	AIME2025@pass1	MATH500@pass1	GPQA@pass1
QwQ - 32B - Preview	46.7	37.2	90.6	65.2
LIMO - 32B	56.3	44.5	94.8	58.1
s1 - 32B	56.7	49.3	93.0	59.6
OpenThinker - 32B	66.0	53.3	94.8	60.1
R1 - Distill - 32B	70.0	46.7	92.0	59.6
ReasonFlux - Zero - 32B	56.7	37.2	91.2	61.2
ReasonFlux - F1 - 32B	76.7	53.3	96.0	67.2

📄 许可证

许可证类型：other

📦 模型信息

属性	详情
库名称	transformers
基础模型	deepseek - ai/DeepSeek - R1 - Distill - Qwen - 7B
标签	llama - factory、full、generated_from_trainer
模型名称	ReasonFlux - F1 - 7B

📖 引用信息

@article{yang2025reasonflux,
  title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
  author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
  journal={arXiv preprint arXiv:2502.06772},
  year={2025}
}