STILL-3-1.5B-preview開源AI模型 - 強化推理實現數學測試高準確率

首頁

STILL 3 1.5B Preview

由RUC-AIBOX開發

STILL-3-1.5B-preview是一款採用強化學習技術增強推理能力的慢思考模型，在AIME基準測試中達到39.33%準確率

大型語言模型

Transformers

#慢思考推理 #數學推理增強 #強化學習優化

下載量 2,186

發布時間 : 1/25/2025

模型概述

15億參數規模的慢思考推理模型，通過強化學習技術提升數學推理能力，特別適合解決複雜數學問題

模型特點

慢思考推理能力

採用強化學習技術增強模型的逐步推理能力

小模型高性能

在15億參數規模下實現接近大模型的數學推理能力

多基準測試驗證

在MATH、AIME、OMNI和LiveAOPS等多個數學基準上表現優異

模型能力

數學問題求解

符號推理

多步數學推導

極座標轉換等數學運算

使用案例

數學教育

數學競賽題解答

解決AIME等數學競賽中的複雜問題

AIME基準準確率39.33%

數學概念應用

座標系轉換等數學概念的實際應用

學術研究

小模型推理能力研究

研究強化學習對小模型推理能力的提升效果

相對基線提升37.18%

🚀 STILL-3-1.5B-preview：慢思考推理模型

我們發佈了 STILL-3-1.5B-preview，這是一個慢思考推理模型，在AIME基準測試中達到了39.33%的準確率！我們在15億參數的模型上應用了強化學習，並觀察到隨著訓練步數的增加，模型性能持續提升。為了更好地復現我們的工作並推動研究進展，我們開源了代碼、模型和數據。

代碼鏈接：https://github.com/RUCAIBox/Slow_Thinking_with_LLMs

🚀 快速開始

from transformers import AutoTokenizer, AutoModelForCausalLM
from vllm import LLM, SamplingParams

# 加載模型和分詞器
tokenizer = AutoTokenizer.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")
model = AutoModelForCausalLM.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")

# 輸入文本
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"

input_prompts = tokenizer.apply_chat_template(
                [
                {"role": "user", "content": question}],
                tokenize=False,
                add_generation_prompt=True
            )

# 參數設置
llm = LLM(model=model_path, tensor_parallel_size=1, dtype='bfloat16')

sampling_params_gs = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=32768, stop=stop_words, seed=42, skip_special_tokens=False)

# 生成結果
responses = model.generate(input_prompts, sampling_params)
print(responses[0].outputs[0].text)

✨ 主要特性

我們對模型在四個基準測試上進行了評估：MATH、AIME、OMNI和LiveAOPS。對於MATH和AIME，我們採用了採樣解碼設置，採樣溫度為0.6，top-p採樣概率為0.95。每個問題採樣64次，並計算平均分。對於OMNI和LiveAOPS（2024年8月 - 11月），我們隨機抽取了一部分答案作為整數以方便自動評估，並使用貪心搜索解碼進行評估。訓練後的模型STILL-3-1.5B-preview取得了顯著的改進。AIME任務的準確率從28.67%提高到39.33%，相對提升了37.18%。

	MATH	AIME	OMNI	LiveAOPS	平均
基礎模型	84.04	28.67	25.60	33.33	42.91
STILL-3-1.5B-preview	85.48	39.33	33.00	39.50	49.33

📚 詳細文檔

如果我們的報告對您的研究有幫助，請引用以下內容：

@article{Slow_Thinking_with_LLMs_3_Preview,
  title={STILL-3-1.5B-preview: Enhancing Slow Thinking Abilities of Small Models through Reinforcement Learning
},
  author={RUCAIBox STILL Team},
  url={https://github.com/RUCAIBox/Slow_Thinking_with_LLMs},
  year={2025}
}

@article{Slow_Thinking_with_LLMs_1,
  title={Enhancing LLM Reasoning with Reward-guided Tree Search},
  author={Jiang, Jinhao and Chen, Zhipeng and Min, Yingqian and Chen, Jie and Cheng, Xiaoxue and Wang, Jiapeng and Tang, Yiru and Sun, Haoxiang and Deng, Jia and Zhao, Wayne Xin and Liu, Zheng and Yan, Dong and Xie, Jian and Wang, Zhongyuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2411.11694},
  year={2024}
}

@article{Slow_Thinking_with_LLMs_2,
  title={Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems},
  author={Min, Yingqian and Chen, Zhipeng and Jiang, Jinhao and Chen, Jie and Deng, Jia and Hu, Yiwen and Tang, Yiru and Wang, Jiapeng and Cheng, Xiaoxue and Song, Huatong and Zhao, Wayne Xin and Liu, Zheng and Wang, Zhongyuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2412.09413},
  year={2024}
}