Magistral-Small-2506开源推理模型 - 支持多语言与长链推理，高效又实用

首页

Magistral Small 2506

由 mistralai 开发

Magistral-Small-2506是基于Mistral Small 3.1构建的小型高效推理模型，拥有240亿参数，支持多语言和长链推理。

大型语言模型

Safetensors

支持多种语言开源协议:Apache-2.0 #长链推理 #多语言AI #高效本地部署

下载量 13.51k

发布时间 : 6/4/2025

模型简介

Magistral-Small-2506是一个高效的小型推理模型，支持多语言和长链推理，适用于本地部署和多种应用场景。

模型特点

高效推理能力

模型在给出答案之前能够进行长链推理，适合复杂任务。

多语言支持

支持数十种语言，包括英语、法语、德语、中文等。

本地部署

量化后可适配单张RTX 4090或配备32GB内存的MacBook。

大上下文窗口

拥有128k的上下文窗口，但建议最大设置为40k以获得最佳性能。

模型能力

文本生成

多语言支持

长链推理

本地部署

使用案例

文本生成

多语言文本生成

生成多种语言的文本内容，适用于国际化应用。

高质量的多语言文本输出

复杂推理

数学问题求解

解决复杂的数学问题，包括多步推理。

准确的数学答案和推理过程

🚀 Magistral-Small-2506模型卡片

Magistral-Small-2506基于Mistral Small 3.1（2503）构建，新增推理能力，经过从Magistral Medium痕迹进行的SFT和后续的RL优化，是一个拥有240亿参数的小型高效推理模型。

Magistral Small可以进行本地部署，量化后可适配单张RTX 4090或配备32GB内存的MacBook。

在我们的博客文章中了解更多关于Magistral的信息。

该模型在论文Magistral中被提出。

🚀 快速开始

Magistral-Small-2506可在本地部署，量化后能适配单张RTX 4090或32GB内存的MacBook。你可以参考以下不同使用场景的具体操作。

✨ 主要特性

推理能力：在给出答案之前，能够进行长链推理。
多语言支持：支持数十种语言，包括英语、法语、德语、希腊语、印地语、印尼语、意大利语、日语、韩语、马来语、尼泊尔语、波兰语、葡萄牙语、罗马尼亚语、俄语、塞尔维亚语、西班牙语、土耳其语、乌克兰语、越南语、阿拉伯语、孟加拉语、中文和波斯语。
Apache 2.0许可证：开放许可，允许商业和非商业用途的使用和修改。
上下文窗口：拥有128k的上下文窗口，但超过40k时性能可能会下降。因此，我们建议将最大模型长度设置为40k。

📊 基准测试结果

模型	AIME24 pass@1	AIME25 pass@1	GPQA Diamond	Livecodebench (v5)
Magistral Medium	73.59%	64.95%	70.83%	59.36%
Magistral Small	70.68%	62.76%	68.18%	55.84%

💡 采样参数

请确保使用以下参数：

top_p：0.95
temperature：0.7
max_tokens：40960

💻 使用示例

基本聊天模板

我们强烈建议在使用时包含RL期间使用的默认系统提示，以获得最佳效果。你可以根据具体用例进行编辑和定制。

<s>[SYSTEM_PROMPT]system_prompt

A user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown to format your response. Write both your thoughts and summary in the same language as the task posed by the user. NEVER use \boxed{} in your response.

Your thinking process must follow the template below:
<think>
Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.
</think>

Here, provide a concise summary that reflects your reasoning and presents a clear final answer to the user. Don't mention that this is a summary.

Problem:

[/SYSTEM_PROMPT][INST]user_message[/INST]<think>
reasoning_traces
</think>
assistant_response</s>[INST]user_message[/INST]

system_prompt、user_message和assistant_response为占位符。

你可以根据用例和需求，选择在多轮交互中保留推理痕迹或仅保留最终的助手回复。

请确保以mistral-common作为参考标准

不同框架下的使用

推理

vllm (推荐)：详情见下文

此外，社区还提供了该模型的量化版本，可与以下框架配合使用（按字母顺序排序）：

llama.cpp：https://huggingface.co/mistralai/Magistral-Small-2506_gguf
lmstudio (llama.cpp, MLX)：https://lmstudio.ai/models/mistralai/magistral-small
ollama：https://ollama.com/library/magistral
unsloth (llama.cpp)：https://huggingface.co/unsloth/Magistral-Small-2506-GGUF

训练

可使用以下框架进行微调（按字母顺序排序）：

axolotl：https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral
unsloth：https://docs.unsloth.ai/basics/magistral

其他

你还可以在以下平台使用Magistral：

kaggle：https://www.kaggle.com/models/mistral-ai/magistral-small-2506

vLLM（推荐）

我们建议使用vLLM库来实现可用于生产环境的推理管道。

📦 安装指南

确保安装最新的vLLM代码：

pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly

这样做应该会自动安装mistral_common >= 1.6.0。

要进行检查，可运行：

python -c "import mistral_common; print(mistral_common.__version__)"

你也可以使用现成的Docker镜像或在Docker Hub上获取。

服务模型

按以下方式启动服务：

vllm serve mistralai/Magistral-Small-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2

调用模型

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.7
TOP_P = 0.95
MAX_TOK = 40_960

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

query = "Write 4 sentences, each with at least 8 words. Now make absolutely sure that every sentence has exactly one word less than the previous sentence."
# or try out other queries
# query = "Exactly how many days ago did the French Revolution start? Today is June 4th, 2025."
# query = "Think about 5 random numbers. Verify if you can combine them with addition, multiplication, subtraction or division to 133"
# query = "If it takes 30 minutes to dry 12 T-shirts in the sun, how long does it take to dry 33 T-shirts?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": query}
]
stream = client.chat.completions.create(
  model=model,
  messages=messages,
  stream=True,
  temperature=TEMP,
  top_p=TOP_P,
  max_tokens=MAX_TOK,
)

print("client: Start streaming chat completions...")
printed_content = False

for chunk in stream:
  content = None
  # Check the content is content
  if hasattr(chunk.choices[0].delta, "content"):
    content = chunk.choices[0].delta.content

  if content is not None:
    if not printed_content:
        printed_content = True
        print("\ncontent:", end="", flush=True)
    # Extract and print the content
    print(content, end="", flush=True)

# content:<think>
# Alright, I need to write 4 sentences where each one has at least 8 words and each subsequent sentence has one fewer word than the previous one.
# ...
# Final boxed answer (the four sentences):

# \[
# \boxed{
# \begin{aligned}
# &\text{1. The quick brown fox jumps over lazy dog and yells hello.} \\
# &\text{2. I saw the cat on the stair with my hat.} \\
# &\text{3. The man in the moon came down quickly today.} \\
# &\text{4. A cat sat on the mat today patiently.}
# \end{aligned}
# }
# \]

📚 详细文档

模型信息

属性	详情
基础模型	mistralai/Mistral-Small-3.1-24B-Instruct-2503
支持语言	英语、法语、德语、西班牙语、葡萄牙语、意大利语、日语、韩语、俄语、中文、阿拉伯语、波斯语、印尼语、马来语、尼泊尔语、波兰语、罗马尼亚语、塞尔维亚语、瑞典语、土耳其语、乌克兰语、越南语、印地语、孟加拉语
库名称	vllm
许可证	apache - 2.0
推理	否