Mistral Small 24B Instruct 2501 4.0bpw Exl2

由matatonic開發

Mistral Small 3 (2501) 是一款擁有240億參數的大語言模型，在70B以下的小型模型類別中表現優異，支持多語言和高級推理能力。

大型語言模型

Transformers

支持多種語言開源協議:Apache-2.0 #多語言對話代理 #低延遲函數調用 #高密度知識推理

下載量 15

發布時間 : 1/30/2025

模型概述

Mistral Small 3 是一款指令微調的大語言模型，具備多語言支持、高級推理能力和函數調用功能，適合本地部署和特定領域微調。

模型特點

多語言支持

支持數十種語言，包括英語、法語、德語、西班牙語、意大利語、中文、日語、韓語等。

高級推理能力

具備先進的對話和推理能力，適合複雜任務處理。

函數調用支持

支持原生函數調用和JSON輸出，適合開發對話代理和自動化工具。

本地部署

量化後可適配單張RTX 4090或32GB內存的MacBook，適合處理敏感數據。

模型能力

文本生成

多語言對話

函數調用

高級推理

指令遵循

使用案例

對話代理

快速響應的對話代理

用於構建低延遲的對話系統，支持多語言交互。

自動化工具

函數調用

支持通過函數調用實現自動化任務，如天氣查詢、文本重寫等。

本地推理

敏感數據處理

適合個人愛好者和組織在本地處理敏感數據，無需依賴雲端服務。

🚀 Mistral-Small-24B-Instruct-2501模型卡片

Mistral Small 3 (2501) 在參數低於700億的“小型”大語言模型類別中樹立了新的標杆，它擁有240億參數，具備可與更大模型相媲美的先進能力！該模型是基礎模型 Mistral-Small-24B-Base-2501 的指令微調版本。

Mistral Small可以在本地部署，並且具有極高的“知識密度”，經過量化後可以在單張RTX 4090或配備32GB內存的MacBook上運行。它非常適合以下場景：

快速響應的對話式代理。
低延遲的函數調用。
通過微調成為特定領域的專家模型。
供愛好者和處理敏感數據的組織進行本地推理。

對於需要特殊能力（如增加上下文長度、特定模態、特定領域知識等）的企業，我們將發佈商業模型，其能力將超越Mistral AI向社區貢獻的模型。

此次發佈展示了我們對開源的承諾，它是一個強大的基礎模型。

在我們的博客文章中瞭解更多關於Mistral Small的信息。

模型開發者：Mistral AI團隊

✨ 主要特性

多語言支持：支持包括英語、法語、德語、西班牙語、意大利語、中文、日語、韓語、葡萄牙語、荷蘭語和波蘭語在內的多種語言。
以代理為中心：具備一流的代理能力，支持原生函數調用和JSON輸出。
高級推理：擁有先進的對話和推理能力。
Apache 2.0許可證：開放許可，允許商業和非商業用途的使用和修改。
上下文窗口：擁有32k的上下文窗口。
系統提示：對系統提示有很強的遵循和支持能力。
分詞器：使用Tekken分詞器，詞彙量為131k。

📊 基準測試結果

人工評估基準

類別	Gemma-2-27B	Qwen-2.5-32B	Llama-3.3-70B	Gpt4o-mini
Mistral更優	0.536	0.496	0.192	0.200
Mistral略優	0.196	0.184	0.164	0.204
平局	0.052	0.060	0.236	0.160
其他模型略優	0.060	0.088	0.112	0.124
其他模型更優	0.156	0.172	0.296	0.312

注意：

我們與外部第三方供應商進行了並排評估，使用了超過1000個專有編碼和通用提示。
評估人員需要從Mistral Small 3和另一個模型生成的匿名結果中選擇他們更喜歡的模型響應。
我們知道在某些情況下，人工判斷的基準與公開可用的基準有很大差異，但我們已格外謹慎地驗證了評估的公平性。我們相信上述基準是有效的。

公開可用的基準

推理與知識

評估指標	mistral-small-24B-instruct-2501	gemma-2b-27b	llama-3.3-70b	qwen2.5-32b	gpt-4o-mini-2024-07-18
mmlu_pro_5shot_cot_instruct	0.663	0.536	0.666	0.683	0.617
gpqa_main_cot_5shot_instruct	0.453	0.344	0.531	0.404	0.377

數學與編碼

評估指標	mistral-small-24B-instruct-2501	gemma-2b-27b	llama-3.3-70b	qwen2.5-32b	gpt-4o-mini-2024-07-18
humaneval_instruct_pass@1	0.848	0.732	0.854	0.909	0.890
math_instruct	0.706	0.535	0.743	0.819	0.761

指令遵循

評估指標	mistral-small-24B-instruct-2501	gemma-2b-27b	llama-3.3-70b	qwen2.5-32b	gpt-4o-mini-2024-07-18
mtbench_dev	8.35	7.86	7.96	8.26	8.33
wildbench	52.27	48.21	50.04	52.73	56.13
arena_hard	0.873	0.788	0.840	0.860	0.897
ifeval	0.829	0.8065	0.8835	0.8401	0.8499

注意：

所有基準測試的性能準確性均通過相同的內部評估管道獲得，因此數字可能與之前報告的性能略有不同（Qwen2.5-32B-Instruct、Llama-3.3-70B-Instruct、Gemma-2-27B-IT）。
基於評判的評估（如Wildbench、Arena hard和MTBench）基於gpt-4o-2024-05-13。

基本指令模板 (V7-Tekken)

<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]

<system_prompt>、<user message> 和 <assistant response> 是佔位符。

請確保使用 mistral-common 作為參考標準

📦 安裝指南

vLLM

確保安裝 vLLM >= 0.6.4：

pip install --upgrade vllm

同時確保安裝了 mistral_common >= 1.5.2：

pip install --upgrade mistral_common

你也可以使用現成的 Docker鏡像或從 Docker Hub 獲取。

💻 使用示例

vLLM

我們建議使用 vLLM庫來實現生產就緒的推理管道。

注意1：建議使用較低的溫度參數，例如 temperature=0.15。

注意2：確保為模型添加系統提示，以使其更好地滿足你的需求。如果你想將模型用作通用助手，建議使用以下系統提示：

system_prompt = """你是Mistral Small 3，一個由Mistral AI創建的大語言模型（LLM），Mistral AI是一家總部位於巴黎的法國初創公司。
你的知識庫最後更新於2023年10月1日。當前日期是2025年1月30日。
當你不確定某些信息時，你會表明你沒有相關信息，不會編造內容。
如果用戶的問題不明確、模糊或沒有提供足夠的上下文讓你準確回答問題，你不會立即嘗試回答，而是會請用戶澄清他們的請求（例如，“我附近有哪些好的餐廳？” => “你在哪裡？” 或 “下一班去東京的航班是什麼時候” => “你從哪裡出發？”）"""

服務器端

建議在服務器/客戶端環境中使用Mistral-Small-24B-Instruct-2501。

啟動服務器：

vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice

注意：在GPU上運行Mistral-Small-24B-Instruct-2501需要約55GB的GPU顯存（bf16或fp16）。

可以使用以下簡單的Python代碼片段來測試客戶端：

import requests
import json
from datetime import datetime, timedelta

url = "http://<your-server>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-24B-Instruct-2501"

messages = [
    {
        "role": "system",
        "content": "你是一個對話式代理，總是直截了當地回答問題，每次準確回答後都以一隻貓的ASCII畫結尾。"
    },
    {
        "role": "user",
        "content": "請給我5種法語中非正式的“再見”表達方式。"
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

# 當然，以下是法語中5種非正式的“再見”表達方式：
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
#  /\_/\
# ( o.o )
#  > ^ <
# ```

函數調用

Mistral-Small-24-Instruct-2501在通過vLLM進行函數/工具調用任務方面表現出色。例如：

示例

```py import requests import json from huggingface_hub import hf_hub_download from datetime import datetime, timedelta

url = "http://:8000/v1/chat/completions" headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-24B-Instruct-2501"

def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "獲取給定地點的當前天氣", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "要查詢天氣的城市，例如 'San Francisco'", }, "state": { "type": "string", "description": "州的縮寫，例如 'CA' 代表加利福尼亞州", }, "unit": { "type": "string", "description": "溫度單位", "enum": ["celsius", "fahrenheit"], }, }, "required": ["city", "state", "unit"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "重寫給定文本以提高清晰度", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "要重寫的輸入文本", } }, }, }, }, ]

messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "你能把下面的文章寫得更簡潔嗎？\n\nOpenAI是一個人工智能研究實驗室，由非營利組織OpenAI Incorporated及其營利性子公司OpenAI Limited Partnership組成。", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "bbc5b7ede", "type": "function", "function": { "name": "rewrite", "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}', }, } ], }, { "role": "tool", "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}', "tool_call_id": "bbc5b7ede", "name": "rewrite", }, { "role": "assistant", "content": "---\n\nOpenAI是一家營利性公司。", }, { "role": "user", "content": "你能告訴我達拉斯的華氏溫度是多少嗎？", }, ]

data = {"model": model, "messages": messages, "tools": tools}

response = requests.post(url, headers=headers, data=json.dumps(data)) import ipdb; ipdb.set_trace() print(response.json()["choices"][0]["message"]["tool_calls"])

[{'id': '8PdihwL6d', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}'}}]

</details>

#### 離線使用
```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
from datetime import datetime, timedelta

SYSTEM_PROMPT = "你是一個對話式代理，總是直截了當地回答問題，每次準確回答後都以一隻貓的ASCII畫結尾。"

user_prompt = "請給我5種法語中非正式的“再見”表達方式。"

messages = [
    {
        "role": "system",
        "content": SYSTEM_PROMPT
    },
    {
        "role": "user",
        "content": user_prompt
    },
]

# 注意，在GPU上運行此模型需要超過60GB的GPU顯存
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)

sampling_params = SamplingParams(max_tokens=512, temperature=0.15)
outputs = llm.chat(messages, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)
# 當然，以下是法語中5種非正式的“再見”表達方式：
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
#  /\_/\
# ( o.o )
#  > ^ <
# ```

Transformers

如果你想使用Hugging Face transformers生成文本，可以這樣做：

from transformers import pipeline
import torch

messages = [
    {"role": "user", "content": "請給我5種法語中非正式的“再見”表達方式。"},
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-Small-24B-Instruct-2501", max_new_tokens=256, torch_dtype=torch.bfloat16)
chatbot(messages)

Ollama

Ollama 可以在MacOS、Windows和Linux上本地運行此模型。

ollama run mistral-small

4位量化（默認別名）：

ollama run mistral-small:24b-instruct-2501-q4_K_M

8位量化：

ollama run mistral-small:24b-instruct-2501-q8_0

FP16：

ollama run mistral-small:24b-instruct-2501-fp16

📄 許可證

本模型採用Apache 2.0許可證。如果你想了解更多關於我們如何處理你的個人數據的信息，請閱讀我們的隱私政策。

精選推薦AI模型

Llama 3 Typhoon V1.5x 8b Instruct

專為泰語設計的80億參數指令模型，性能媲美GPT-3.5-turbo，優化了應用場景、檢索增強生成、受限生成和推理任務

Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型，專為邊緣設備推理設計，體積僅為Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基於RoBERTa架構的中文抽取式問答模型，適用於從給定文本中提取答案的任務。

智啟未來，您的人工智能解決方案智庫