模型概述
模型特點
模型能力
使用案例
🚀 Mistral-Small-24B-Instruct-2501
Mistral-Small-24B-Instruct-2501是一款出色的小型大語言模型,參數達24B,在性能上可媲美大型模型。它支持本地部署,適用於快速響應的對話代理、低延遲函數調用等多種場景。
🚀 快速開始
Mistral-Small-24B-Instruct-2501模型可搭配以下框架使用:
vllm
:詳情見 此處transformers
:詳情見 此處
✨ 主要特性
- 多語言支持:支持包括英語、法語、德語、西班牙語、意大利語、中文、日語、韓語、葡萄牙語、荷蘭語和波蘭語等數十種語言。
- 以代理為中心:具備一流的代理能力,支持原生函數調用和JSON輸出。
- 高級推理:擁有先進的對話和推理能力。
- Apache 2.0許可證:開放許可,允許商業和非商業用途的使用與修改。
- 上下文窗口:擁有32k的上下文窗口。
- 系統提示:嚴格遵循並支持系統提示。
- 分詞器:採用Tekken分詞器,詞彙量達131k。
📦 安裝指南
vLLM
建議使用 vLLM庫 來實現生產就緒的推理管道。
注意1:建議使用較低的溫度參數,例如 temperature=0.15
。
注意2:確保為模型添加系統提示,以更好地滿足您的需求。如果將模型用作通用助手,建議使用以下系統提示:
system_prompt = """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023-10-01. The current date is 2025-01-30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")"""
安裝步驟:
確保安裝 vLLM >= 0.6.4
:
pip install --upgrade vllm
同時確保安裝 mistral_common >= 1.5.2
:
pip install --upgrade mistral_common
您也可以使用現成的 Docker鏡像 或在 Docker Hub 上獲取。
服務器部署
建議在服務器/客戶端環境中使用Mistral-Small-24B-Instruct-2501。
- 啟動服務器:
vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice
注意:在GPU上運行Mistral-Small-24B-Instruct-2501需要約55GB的GPU顯存(bf16或fp16)。 2. 可以使用以下簡單的Python代碼片段來測試客戶端:
import requests
import json
from datetime import datetime, timedelta
url = "http://<your-server>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
model = "mistralai/Mistral-Small-24B-Instruct-2501"
messages = [
{
"role": "system",
"content": "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
},
{
"role": "user",
"content": "Give me 5 non-formal ways to say 'See you later' in French."
},
]
data = {"model": model, "messages": messages}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Sure, here are five non-formal ways to say "See you later" in French:
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
# /\_/\
# ( o.o )
# > ^ <
# ```
Ollama
Ollama 可在MacOS、Windows和Linux上本地運行此模型。
- 4位量化(默認):
ollama run mistral-small
- 8位量化:
ollama run mistral-small:24b-instruct-2501-q8_0
- FP16:
ollama run mistral-small:24b-instruct-2501-fp16
💻 使用示例
vLLM
函數調用
Mistral-Small-24-Instruct-2501在通過vLLM進行函數/工具調用任務方面表現出色。示例如下:
示例
```py import requests import json from huggingface_hub import hf_hub_download from datetime import datetime, timedeltaurl = "http://
model = "mistralai/Mistral-Small-24B-Instruct-2501"
def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The city to find the weather for, e.g. 'San Francisco'", }, "state": { "type": "string", "description": "The state abbreviation, e.g. 'CA' for California", }, "unit": { "type": "string", "description": "The unit for temperature", "enum": ["celsius", "fahrenheit"], }, }, "required": ["city", "state", "unit"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "Rewrite a given text for improved clarity", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "The input text to rewrite", } }, }, }, }, ]
messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "bbc5b7ede", "type": "function", "function": { "name": "rewrite", "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}', }, } ], }, { "role": "tool", "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}', "tool_call_id": "bbc5b7ede", "name": "rewrite", }, { "role": "assistant", "content": "---\n\nOpenAI is a FOR-profit company.", }, { "role": "user", "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?", }, ]
data = {"model": model, "messages": messages, "tools": tools}
response = requests.post(url, headers=headers, data=json.dumps(data)) import ipdb; ipdb.set_trace() print(response.json()["choices"][0]["message"]["tool_calls"])
[{'id': '8PdihwL6d', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}'}}]
</details>
#### 離線使用
```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
from datetime import datetime, timedelta
SYSTEM_PROMPT = "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
user_prompt = "Give me 5 non-formal ways to say 'See you later' in French."
messages = [
{
"role": "system",
"content": SYSTEM_PROMPT
},
{
"role": "user",
"content": user_prompt
},
]
# note that running this model on GPU requires over 60 GB of GPU RAM
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)
sampling_params = SamplingParams(max_tokens=512, temperature=0.15)
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
# Sure, here are five non-formal ways to say "See you later" in French:
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
# /\_/\
# ( o.o )
# > ^ <
# ```
Transformers
如果想使用Hugging Face的transformers庫生成文本,可以參考以下代碼:
from transformers import pipeline
import torch
messages = [
{"role": "user", "content": "Give me 5 non-formal ways to say 'See you later' in French."},
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-Small-24B-Instruct-2501", max_new_tokens=256, torch_dtype=torch.bfloat16)
chatbot(messages)
📚 詳細文檔
基準測試結果
人工評估基準測試
類別 | Gemma-2-27B | Qwen-2.5-32B | Llama-3.3-70B | Gpt4o-mini |
---|---|---|---|---|
Mistral更優 | 0.536 | 0.496 | 0.192 | 0.200 |
Mistral略優 | 0.196 | 0.184 | 0.164 | 0.204 |
平局 | 0.052 | 0.060 | 0.236 | 0.160 |
其他模型略優 | 0.060 | 0.088 | 0.112 | 0.124 |
其他模型更優 | 0.156 | 0.172 | 0.296 | 0.312 |
注意:
- 與外部第三方供應商進行了並排評估,使用了超過1k個專有編碼和通用提示。
- 評估人員需要從Mistral Small 3和其他模型生成的匿名結果中選擇他們更喜歡的模型響應。
- 我們意識到在某些情況下,人工判斷的基準測試結果與公開可用的基準測試結果有很大差異,但我們已格外謹慎地驗證了評估的公平性,相信上述基準測試結果是有效的。
公開可用的基準測試
推理與知識
評估指標 | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
數學與編碼
評估指標 | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
指令遵循
評估指標 | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 |
wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 |
arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 |
ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
注意:
- 所有基準測試的性能準確性均通過相同的內部評估管道獲得,因此數字可能與之前報告的性能略有差異(Qwen2.5-32B-Instruct,Llama-3.3-70B-Instruct,Gemma-2-27B-IT)。
- 基於評判的評估,如Wildbench、Arena hard和MTBench,基於gpt-4o-2024-05-13。
基本指令模板 (V7-Tekken)
<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
<system_prompt>
、<user message>
和 <assistant response>
是佔位符。
請確保使用 mistral-common 作為參考標準
🔧 技術細節
AWQ量化:由stelterlab使用AutoAWQ(由casper-hansen開發,https://github.com/casper-hansen/AutoAWQ/)在INT4 GEMM中完成。 原始權重由Mistral AI提供。
📄 許可證
本模型採用Apache 2.0許可證,允許商業和非商業用途的使用與修改。
此外,如果您想了解我們如何處理您的個人數據,請閱讀我們的 隱私政策。您可以在我們的 博客文章 中瞭解更多關於Mistral Small的信息。模型開發者為Mistral AI團隊。



