Mistral-Small-24B-Instruct-2501-reasoning開源數學推理模型，提升數學推理能力

首頁

Mistral Small 24B Instruct 2501 Reasoning

由yentinglin開發

基於Mistral-Small-24B-Instruct-2501微調的數學推理模型，優化了數學推理能力

大型語言模型

Safetensors

英語開源協議:Apache-2.0 #數學推理優化 #高準確率推理 #競賽數學解題

下載量 1,689

發布時間 : 2/15/2025

模型概述

該模型是針對數學推理任務優化的版本，在多個數學數據集上進行了微調，旨在提升其推理能力。

模型特點

數學推理優化

針對數學推理任務進行了專門優化，提升瞭解決數學問題的能力

多數據集微調

在OpenR1-Math-220k和s1K-1.1等多個數學數據集上進行了微調

高性能推理

在多項數學評估中表現出色，如MATH-500數據集上達到95%的pass@1準確率

模型能力

數學問題解答

複雜推理任務處理

數學證明生成

數學競賽題解答

使用案例

教育

數學競賽輔導

幫助學生準備數學競賽如AIME等

在AIME 2024測試中達到66.67%的pass@1準確率

數學學習助手

解答各類數學問題，提供解題步驟

在MATH-500測試中達到95%的pass@1準確率

研究

數學推理研究

用於數學推理能力的研究和評估

在GPQA Diamond測試中達到62.02%的pass@1準確率

🚀 Mistral-Small-Reasoning

Mistral-Small-Reasoning是基於mistralai/Mistral-Small-24B-Instruct-2501微調的模型，專門針對數學推理任務進行了優化。它在多個數據集上進行了微調，以增強推理能力，可用於文本生成任務。

📚 詳細文檔

模型詳情

開發者：Yenting Lin
資助方：Ubitus
模型類型：用於推理的指令調優語言模型
語言：英語（en）
許可證：Apache 2.0
微調基礎模型：mistralai/Mistral-Small-24B-Instruct-2501

屬性	詳情
模型類型	用於推理的指令調優語言模型
訓練數據	- open-r1/OpenR1-Math-220k - yentinglin/s1K-1.1-trl-format - simplescaling/s1K-1.1
評估指標	準確率
基礎模型	mistralai/Mistral-Small-24B-Instruct-2501
任務類型	文本生成
標籤	推理

模型評估結果

任務類型	數據集名稱	數據集類型	Pass@1 值	來源
文本生成	MATH-500	MATH	0.95	yentinglin/zhtw-reasoning-eval-leaderboard
文本生成	AIME 2025	AIME	0.5333	yentinglin/zhtw-reasoning-eval-leaderboard
文本生成	AIME 2024	AIME	0.6667	yentinglin/zhtw-reasoning-eval-leaderboard
文本生成	GPQA Diamond	GPQA	0.62022	yentinglin/zhtw-reasoning-eval-leaderboard

與其他模型的對比評估

Pass@1	# Params	MATH-500	AIME 2025	AIME 2024	GPQA Diamond
Mistral-24B-Reasoning (Ours)	24B	95.0	53.33	66.67	62.02
Mistral-24B-Instruct	24B	70.6	-	-	45.3
s1.1-32B	32B	93.2	40.0	56.7	61.62
LIMO	32B	94.8	36.67	57.1	59.09
DeepSeek-R1-Distill-Llama-70B	70B	94.5	46.67	70.0	65.2
DeepSeek-R1-Distill-Qwen-32B	32B	94.3	60.0	72.6	62.1
DeepSeek-R1	671B	97.3	70.0	72.6	71.5
o1	-	96.4	79.0	-	75.7
o3-mini (high)	-	97.9	86.5	-	77.2
o3-mini (medium)	-	97.3	76.5	-	74.9

🚀 快速開始

模型演示可在 twllm.com 查看，可使用 vLLM 或 sglang 進行推理。

🔧 技術細節

訓練環境

模型使用 4×8 H100 GPUs 進行訓練，由 Ubitus 提供。

訓練配置

查看訓練配置

axolotl 版本: a98526ef7843a3e8aa006f260e6b4fb8912b5f1a

base_model: mistralai/Mistral-Small-24B-Instruct-2501

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

datasets:
  - path: yentinglin/s1K-1.1-trl-format
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: open-r1/OpenR1-Math-220k
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_field_role: from
    message_field_content: value
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./placeholder/

sequence_len: 32768
sample_packing: true
eval_sample_packing: False
pad_to_sequence_len: true

wandb_project: Reasoning
wandb_entity:
wandb_watch:
wandb_name: Mistral-24B-SFT-220k
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 5
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
saves_per_epoch: 2
weight_decay: 0.0
deepspeed: deepspeed_configs/zero3_bf16.json
special_tokens:
  pad_token: "<pad>"

評估

評估代碼可在 Hugging Face Open-R1 查看。注意，AIME 25 數據集已更新為完整版本，可在 AIME 2025 獲取。評估結果為多次運行的平均值，詳細評估信息可查看此處。

📄 許可證

本模型採用 Apache 2.0 許可證。

📖 引用

如果使用此模型，請引用以下內容：

@article{yentinglin2025_mistral_reasoning,
  author = {Yenting Lin},
  title = {Mistral-Small-24B-Instruct-2501-reasoning},
  journal = {Hugging Face},
  year = {2025},
  url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning}
}