模型简介
模型特点
模型能力
使用案例
🚀 Mistral-Small-24B-Instruct-2501
Mistral-Small-24B-Instruct-2501是一款出色的小型大语言模型,参数达24B,在性能上可媲美大型模型。它支持本地部署,适用于快速响应的对话代理、低延迟函数调用等多种场景。
🚀 快速开始
Mistral-Small-24B-Instruct-2501模型可搭配以下框架使用:
vllm
:详情见 此处transformers
:详情见 此处
✨ 主要特性
- 多语言支持:支持包括英语、法语、德语、西班牙语、意大利语、中文、日语、韩语、葡萄牙语、荷兰语和波兰语等数十种语言。
- 以代理为中心:具备一流的代理能力,支持原生函数调用和JSON输出。
- 高级推理:拥有先进的对话和推理能力。
- Apache 2.0许可证:开放许可,允许商业和非商业用途的使用与修改。
- 上下文窗口:拥有32k的上下文窗口。
- 系统提示:严格遵循并支持系统提示。
- 分词器:采用Tekken分词器,词汇量达131k。
📦 安装指南
vLLM
建议使用 vLLM库 来实现生产就绪的推理管道。
注意1:建议使用较低的温度参数,例如 temperature=0.15
。
注意2:确保为模型添加系统提示,以更好地满足您的需求。如果将模型用作通用助手,建议使用以下系统提示:
system_prompt = """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023-10-01. The current date is 2025-01-30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")"""
安装步骤:
确保安装 vLLM >= 0.6.4
:
pip install --upgrade vllm
同时确保安装 mistral_common >= 1.5.2
:
pip install --upgrade mistral_common
您也可以使用现成的 Docker镜像 或在 Docker Hub 上获取。
服务器部署
建议在服务器/客户端环境中使用Mistral-Small-24B-Instruct-2501。
- 启动服务器:
vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice
注意:在GPU上运行Mistral-Small-24B-Instruct-2501需要约55GB的GPU显存(bf16或fp16)。 2. 可以使用以下简单的Python代码片段来测试客户端:
import requests
import json
from datetime import datetime, timedelta
url = "http://<your-server>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
model = "mistralai/Mistral-Small-24B-Instruct-2501"
messages = [
{
"role": "system",
"content": "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
},
{
"role": "user",
"content": "Give me 5 non-formal ways to say 'See you later' in French."
},
]
data = {"model": model, "messages": messages}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Sure, here are five non-formal ways to say "See you later" in French:
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
# /\_/\
# ( o.o )
# > ^ <
# ```
Ollama
Ollama 可在MacOS、Windows和Linux上本地运行此模型。
- 4位量化(默认):
ollama run mistral-small
- 8位量化:
ollama run mistral-small:24b-instruct-2501-q8_0
- FP16:
ollama run mistral-small:24b-instruct-2501-fp16
💻 使用示例
vLLM
函数调用
Mistral-Small-24-Instruct-2501在通过vLLM进行函数/工具调用任务方面表现出色。示例如下:
示例
```py import requests import json from huggingface_hub import hf_hub_download from datetime import datetime, timedeltaurl = "http://
model = "mistralai/Mistral-Small-24B-Instruct-2501"
def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The city to find the weather for, e.g. 'San Francisco'", }, "state": { "type": "string", "description": "The state abbreviation, e.g. 'CA' for California", }, "unit": { "type": "string", "description": "The unit for temperature", "enum": ["celsius", "fahrenheit"], }, }, "required": ["city", "state", "unit"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "Rewrite a given text for improved clarity", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "The input text to rewrite", } }, }, }, }, ]
messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "bbc5b7ede", "type": "function", "function": { "name": "rewrite", "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}', }, } ], }, { "role": "tool", "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}', "tool_call_id": "bbc5b7ede", "name": "rewrite", }, { "role": "assistant", "content": "---\n\nOpenAI is a FOR-profit company.", }, { "role": "user", "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?", }, ]
data = {"model": model, "messages": messages, "tools": tools}
response = requests.post(url, headers=headers, data=json.dumps(data)) import ipdb; ipdb.set_trace() print(response.json()["choices"][0]["message"]["tool_calls"])
[{'id': '8PdihwL6d', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}'}}]
</details>
#### 离线使用
```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
from datetime import datetime, timedelta
SYSTEM_PROMPT = "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
user_prompt = "Give me 5 non-formal ways to say 'See you later' in French."
messages = [
{
"role": "system",
"content": SYSTEM_PROMPT
},
{
"role": "user",
"content": user_prompt
},
]
# note that running this model on GPU requires over 60 GB of GPU RAM
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)
sampling_params = SamplingParams(max_tokens=512, temperature=0.15)
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
# Sure, here are five non-formal ways to say "See you later" in French:
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
# /\_/\
# ( o.o )
# > ^ <
# ```
Transformers
如果想使用Hugging Face的transformers库生成文本,可以参考以下代码:
from transformers import pipeline
import torch
messages = [
{"role": "user", "content": "Give me 5 non-formal ways to say 'See you later' in French."},
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-Small-24B-Instruct-2501", max_new_tokens=256, torch_dtype=torch.bfloat16)
chatbot(messages)
📚 详细文档
基准测试结果
人工评估基准测试
类别 | Gemma-2-27B | Qwen-2.5-32B | Llama-3.3-70B | Gpt4o-mini |
---|---|---|---|---|
Mistral更优 | 0.536 | 0.496 | 0.192 | 0.200 |
Mistral略优 | 0.196 | 0.184 | 0.164 | 0.204 |
平局 | 0.052 | 0.060 | 0.236 | 0.160 |
其他模型略优 | 0.060 | 0.088 | 0.112 | 0.124 |
其他模型更优 | 0.156 | 0.172 | 0.296 | 0.312 |
注意:
- 与外部第三方供应商进行了并排评估,使用了超过1k个专有编码和通用提示。
- 评估人员需要从Mistral Small 3和其他模型生成的匿名结果中选择他们更喜欢的模型响应。
- 我们意识到在某些情况下,人工判断的基准测试结果与公开可用的基准测试结果有很大差异,但我们已格外谨慎地验证了评估的公平性,相信上述基准测试结果是有效的。
公开可用的基准测试
推理与知识
评估指标 | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
数学与编码
评估指标 | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
指令遵循
评估指标 | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 |
wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 |
arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 |
ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
注意:
- 所有基准测试的性能准确性均通过相同的内部评估管道获得,因此数字可能与之前报告的性能略有差异(Qwen2.5-32B-Instruct,Llama-3.3-70B-Instruct,Gemma-2-27B-IT)。
- 基于评判的评估,如Wildbench、Arena hard和MTBench,基于gpt-4o-2024-05-13。
基本指令模板 (V7-Tekken)
<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
<system_prompt>
、<user message>
和 <assistant response>
是占位符。
请确保使用 mistral-common 作为参考标准
🔧 技术细节
AWQ量化:由stelterlab使用AutoAWQ(由casper-hansen开发,https://github.com/casper-hansen/AutoAWQ/)在INT4 GEMM中完成。 原始权重由Mistral AI提供。
📄 许可证
本模型采用Apache 2.0许可证,允许商业和非商业用途的使用与修改。
此外,如果您想了解我们如何处理您的个人数据,请阅读我们的 隐私政策。您可以在我们的 博客文章 中了解更多关于Mistral Small的信息。模型开发者为Mistral AI团队。



