Mistral-Small-24B-Instruct-2501-reasoning, an open-source mathematical reasoning model, enhances mathematical reasoning ability

Mistral Small 24B Instruct 2501 Reasoning

Developed by yentinglin

A mathematical reasoning model fine-tuned based on Mistral-Small-24B-Instruct-2501, optimized for mathematical reasoning capabilities

Large Language Model

Safetensors

EnglishOpen Source License:Apache-2.0 #Mathematical Reasoning Optimization #High Accuracy Reasoning #Competition Math Problem Solving

Downloads 1,689

Release Time : 2/15/2025

Model Overview

This model is a version optimized for mathematical reasoning tasks, fine-tuned on multiple mathematical datasets to enhance its reasoning abilities.

Model Features

Mathematical Reasoning Optimization

Specially optimized for mathematical reasoning tasks, improving the ability to solve mathematical problems

Multi-dataset Fine-tuning

Fine-tuned on multiple mathematical datasets including OpenR1-Math-220k and s1K-1.1

High-performance Reasoning

Performs excellently in various mathematical evaluations, such as achieving a 95% pass@1 accuracy on the MATH-500 dataset

Model Capabilities

Mathematical Problem Solving

Complex Reasoning Task Handling

Mathematical Proof Generation

Competition Math Problem Solving

Use Cases

Education

Math Competition Tutoring

Assisting students in preparing for math competitions such as AIME

Achieved a 66.67% pass@1 accuracy on the AIME 2024 test

Math Learning Assistant

Answering various math problems and providing step-by-step solutions

Achieved a 95% pass@1 accuracy on the MATH-500 test

Research

Mathematical Reasoning Research

Used for research and evaluation of mathematical reasoning capabilities

Achieved a 62.02% pass@1 accuracy on the GPQA Diamond test

🚀 Mistral-Small-Reasoning

This is an instruction-tuned language model for reasoning, fine-tuned from Mistral-Small-24B-Instruct-2501. It's optimized for mathematical reasoning tasks and has shown good performance on multiple datasets.

🚀 Quick Start

A demo is available at twllm.com, and inference can be run using vLLM or sglang.

✨ Features

Instruction-tuned: Specifically optimized for mathematical reasoning tasks.
Fine-tuned on multiple datasets: Including OpenR1-Math-220k and s1K-1.1.
Good performance: Achieved high pass@1 scores on various datasets such as MATH-500, AIME 2025, etc.

📚 Documentation

📦 Model Details

Property	Details
Developed by	Yenting Lin
Funded by	Ubitus
Model Type	Instruction-tuned language model for reasoning
Language(s) (NLP)	English (en)
License	Apache 2.0
Finetuned from model	mistralai/Mistral-Small-24B-Instruct-2501

📋 Training Details

The model was trained using 4×8 H100 GPUs, provided by Ubitus.

See Training config

axolotl version: a98526ef7843a3e8aa006f260e6b4fb8912b5f1a

base_model: mistralai/Mistral-Small-24B-Instruct-2501

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

datasets:
  - path: yentinglin/s1K-1.1-trl-format
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: open-r1/OpenR1-Math-220k
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_field_role: from
    message_field_content: value
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./placeholder/

sequence_len: 32768
sample_packing: true
eval_sample_packing: False
pad_to_sequence_len: true

wandb_project: Reasoning
wandb_entity:
wandb_watch:
wandb_name: Mistral-24B-SFT-220k
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 5
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
saves_per_epoch: 2
weight_decay: 0.0
deepspeed: deepspeed_configs/zero3_bf16.json
special_tokens:
  pad_token: "<pad>"

🔍 Evaluation

The evaluation code is available at Hugging Face Open-R1. Note that I have updated the AIME 25 dataset to the full set, available at AIME 2025.

Our results below are averaged over multiple runs. See our eval details here.

Pass@1	# Params	MATH-500	AIME 2025	AIME 2024	GPQA Diamond
Mistral-24B-Reasoning (Ours)	24B	95.0	53.33	66.67	62.02
Mistral-24B-Instruct	24B	70.6	-	-	45.3
s1.1-32B	32B	93.2	40.0	56.7	61.62
LIMO	32B	94.8	36.67	57.1	59.09
DeepSeek-R1-Distill-Llama-70B	70B	94.5	46.67	70.0	65.2
DeepSeek-R1-Distill-Qwen-32B	32B	94.3	60.0	72.6	62.1
DeepSeek-R1	671B	97.3	70.0	72.6	71.5
o1	-	96.4	79.0	-	75.7
o3-mini (high)	-	97.9	86.5	-	77.2
o3-mini (medium)	-	97.3	76.5	-	74.9

📄 License

The model is licensed under Apache 2.0.

📖 Citation

If you use this model, please cite:

@article{yentinglin2025_mistral_reasoning,
  author = {Yenting Lin},
  title = {Mistral-Small-24B-Instruct-2501-reasoning},
  journal = {Hugging Face},
  year = {2025},
  url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning}
}

⚠️ Disclaimer

This model is provided “as‑is” and without warranties of any kind. Users are solely responsible for evaluating the accuracy and suitability of the outputs. The developers assume no liability for any direct or indirect damages arising from its use.
The model is strictly not intended for high‑risk applications such as medical diagnosis, legal advice, or financial investment. For such use cases, please consult qualified professionals.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご