Mistral Small 24B Instruct 2501 4.0bpw Exl2

由 matatonic 开发

Mistral Small 3 (2501) 是一款拥有240亿参数的大语言模型，在70B以下的小型模型类别中表现优异，支持多语言和高级推理能力。

大型语言模型

Transformers

支持多种语言开源协议:Apache-2.0 #多语言对话代理 #低延迟函数调用 #高密度知识推理

下载量 15

发布时间 : 1/30/2025

模型简介

Mistral Small 3 是一款指令微调的大语言模型，具备多语言支持、高级推理能力和函数调用功能，适合本地部署和特定领域微调。

模型特点

多语言支持

支持数十种语言，包括英语、法语、德语、西班牙语、意大利语、中文、日语、韩语等。

高级推理能力

具备先进的对话和推理能力，适合复杂任务处理。

函数调用支持

支持原生函数调用和JSON输出，适合开发对话代理和自动化工具。

本地部署

量化后可适配单张RTX 4090或32GB内存的MacBook，适合处理敏感数据。

模型能力

文本生成

多语言对话

函数调用

高级推理

指令遵循

使用案例

对话代理

快速响应的对话代理

用于构建低延迟的对话系统，支持多语言交互。

自动化工具

函数调用

支持通过函数调用实现自动化任务，如天气查询、文本重写等。

本地推理

敏感数据处理

适合个人爱好者和组织在本地处理敏感数据，无需依赖云端服务。

🚀 Mistral-Small-24B-Instruct-2501模型卡片

Mistral Small 3 (2501) 在参数低于700亿的“小型”大语言模型类别中树立了新的标杆，它拥有240亿参数，具备可与更大模型相媲美的先进能力！该模型是基础模型 Mistral-Small-24B-Base-2501 的指令微调版本。

Mistral Small可以在本地部署，并且具有极高的“知识密度”，经过量化后可以在单张RTX 4090或配备32GB内存的MacBook上运行。它非常适合以下场景：

快速响应的对话式代理。
低延迟的函数调用。
通过微调成为特定领域的专家模型。
供爱好者和处理敏感数据的组织进行本地推理。

对于需要特殊能力（如增加上下文长度、特定模态、特定领域知识等）的企业，我们将发布商业模型，其能力将超越Mistral AI向社区贡献的模型。

此次发布展示了我们对开源的承诺，它是一个强大的基础模型。

在我们的博客文章中了解更多关于Mistral Small的信息。

模型开发者：Mistral AI团队

✨ 主要特性

多语言支持：支持包括英语、法语、德语、西班牙语、意大利语、中文、日语、韩语、葡萄牙语、荷兰语和波兰语在内的多种语言。
以代理为中心：具备一流的代理能力，支持原生函数调用和JSON输出。
高级推理：拥有先进的对话和推理能力。
Apache 2.0许可证：开放许可，允许商业和非商业用途的使用和修改。
上下文窗口：拥有32k的上下文窗口。
系统提示：对系统提示有很强的遵循和支持能力。
分词器：使用Tekken分词器，词汇量为131k。

📊 基准测试结果

人工评估基准

类别	Gemma-2-27B	Qwen-2.5-32B	Llama-3.3-70B	Gpt4o-mini
Mistral更优	0.536	0.496	0.192	0.200
Mistral略优	0.196	0.184	0.164	0.204
平局	0.052	0.060	0.236	0.160
其他模型略优	0.060	0.088	0.112	0.124
其他模型更优	0.156	0.172	0.296	0.312

注意：

我们与外部第三方供应商进行了并排评估，使用了超过1000个专有编码和通用提示。
评估人员需要从Mistral Small 3和另一个模型生成的匿名结果中选择他们更喜欢的模型响应。
我们知道在某些情况下，人工判断的基准与公开可用的基准有很大差异，但我们已格外谨慎地验证了评估的公平性。我们相信上述基准是有效的。

公开可用的基准

推理与知识

评估指标	mistral-small-24B-instruct-2501	gemma-2b-27b	llama-3.3-70b	qwen2.5-32b	gpt-4o-mini-2024-07-18
mmlu_pro_5shot_cot_instruct	0.663	0.536	0.666	0.683	0.617
gpqa_main_cot_5shot_instruct	0.453	0.344	0.531	0.404	0.377

数学与编码

评估指标	mistral-small-24B-instruct-2501	gemma-2b-27b	llama-3.3-70b	qwen2.5-32b	gpt-4o-mini-2024-07-18
humaneval_instruct_pass@1	0.848	0.732	0.854	0.909	0.890
math_instruct	0.706	0.535	0.743	0.819	0.761

指令遵循

评估指标	mistral-small-24B-instruct-2501	gemma-2b-27b	llama-3.3-70b	qwen2.5-32b	gpt-4o-mini-2024-07-18
mtbench_dev	8.35	7.86	7.96	8.26	8.33
wildbench	52.27	48.21	50.04	52.73	56.13
arena_hard	0.873	0.788	0.840	0.860	0.897
ifeval	0.829	0.8065	0.8835	0.8401	0.8499

注意：

所有基准测试的性能准确性均通过相同的内部评估管道获得，因此数字可能与之前报告的性能略有不同（Qwen2.5-32B-Instruct、Llama-3.3-70B-Instruct、Gemma-2-27B-IT）。
基于评判的评估（如Wildbench、Arena hard和MTBench）基于gpt-4o-2024-05-13。

基本指令模板 (V7-Tekken)

<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]

<system_prompt>、<user message> 和 <assistant response> 是占位符。

请确保使用 mistral-common 作为参考标准

📦 安装指南

vLLM

确保安装 vLLM >= 0.6.4：

pip install --upgrade vllm

同时确保安装了 mistral_common >= 1.5.2：

pip install --upgrade mistral_common

你也可以使用现成的 Docker镜像或从 Docker Hub 获取。

💻 使用示例

vLLM

我们建议使用 vLLM库来实现生产就绪的推理管道。

注意1：建议使用较低的温度参数，例如 temperature=0.15。

注意2：确保为模型添加系统提示，以使其更好地满足你的需求。如果你想将模型用作通用助手，建议使用以下系统提示：

system_prompt = """你是Mistral Small 3，一个由Mistral AI创建的大语言模型（LLM），Mistral AI是一家总部位于巴黎的法国初创公司。
你的知识库最后更新于2023年10月1日。当前日期是2025年1月30日。
当你不确定某些信息时，你会表明你没有相关信息，不会编造内容。
如果用户的问题不明确、模糊或没有提供足够的上下文让你准确回答问题，你不会立即尝试回答，而是会请用户澄清他们的请求（例如，“我附近有哪些好的餐厅？” => “你在哪里？” 或 “下一班去东京的航班是什么时候” => “你从哪里出发？”）"""

服务器端

建议在服务器/客户端环境中使用Mistral-Small-24B-Instruct-2501。

启动服务器：

vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice

注意：在GPU上运行Mistral-Small-24B-Instruct-2501需要约55GB的GPU显存（bf16或fp16）。

可以使用以下简单的Python代码片段来测试客户端：

import requests
import json
from datetime import datetime, timedelta

url = "http://<your-server>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-24B-Instruct-2501"

messages = [
    {
        "role": "system",
        "content": "你是一个对话式代理，总是直截了当地回答问题，每次准确回答后都以一只猫的ASCII画结尾。"
    },
    {
        "role": "user",
        "content": "请给我5种法语中非正式的“再见”表达方式。"
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

# 当然，以下是法语中5种非正式的“再见”表达方式：
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
#  /\_/\
# ( o.o )
#  > ^ <
# ```

函数调用

Mistral-Small-24-Instruct-2501在通过vLLM进行函数/工具调用任务方面表现出色。例如：

示例

```py import requests import json from huggingface_hub import hf_hub_download from datetime import datetime, timedelta

url = "http://:8000/v1/chat/completions" headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-24B-Instruct-2501"

def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "获取给定地点的当前天气", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "要查询天气的城市，例如 'San Francisco'", }, "state": { "type": "string", "description": "州的缩写，例如 'CA' 代表加利福尼亚州", }, "unit": { "type": "string", "description": "温度单位", "enum": ["celsius", "fahrenheit"], }, }, "required": ["city", "state", "unit"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "重写给定文本以提高清晰度", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "要重写的输入文本", } }, }, }, }, ]

messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "你能把下面的文章写得更简洁吗？\n\nOpenAI是一个人工智能研究实验室，由非营利组织OpenAI Incorporated及其营利性子公司OpenAI Limited Partnership组成。", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "bbc5b7ede", "type": "function", "function": { "name": "rewrite", "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}', }, } ], }, { "role": "tool", "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}', "tool_call_id": "bbc5b7ede", "name": "rewrite", }, { "role": "assistant", "content": "---\n\nOpenAI是一家营利性公司。", }, { "role": "user", "content": "你能告诉我达拉斯的华氏温度是多少吗？", }, ]

data = {"model": model, "messages": messages, "tools": tools}

response = requests.post(url, headers=headers, data=json.dumps(data)) import ipdb; ipdb.set_trace() print(response.json()["choices"][0]["message"]["tool_calls"])

[{'id': '8PdihwL6d', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}'}}]

</details>

#### 离线使用
```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
from datetime import datetime, timedelta

SYSTEM_PROMPT = "你是一个对话式代理，总是直截了当地回答问题，每次准确回答后都以一只猫的ASCII画结尾。"

user_prompt = "请给我5种法语中非正式的“再见”表达方式。"

messages = [
    {
        "role": "system",
        "content": SYSTEM_PROMPT
    },
    {
        "role": "user",
        "content": user_prompt
    },
]

# 注意，在GPU上运行此模型需要超过60GB的GPU显存
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)

sampling_params = SamplingParams(max_tokens=512, temperature=0.15)
outputs = llm.chat(messages, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)
# 当然，以下是法语中5种非正式的“再见”表达方式：
#
# 1. À plus tard
# 2. À plus
# 3. Salut
# 4. À toute
# 5. Bisous
#
# ```
#  /\_/\
# ( o.o )
#  > ^ <
# ```

Transformers

如果你想使用Hugging Face transformers生成文本，可以这样做：

from transformers import pipeline
import torch

messages = [
    {"role": "user", "content": "请给我5种法语中非正式的“再见”表达方式。"},
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-Small-24B-Instruct-2501", max_new_tokens=256, torch_dtype=torch.bfloat16)
chatbot(messages)

Ollama

Ollama 可以在MacOS、Windows和Linux上本地运行此模型。

ollama run mistral-small

4位量化（默认别名）：

ollama run mistral-small:24b-instruct-2501-q4_K_M

8位量化：

ollama run mistral-small:24b-instruct-2501-q8_0

FP16：

ollama run mistral-small:24b-instruct-2501-fp16

📄 许可证

本模型采用Apache 2.0许可证。如果你想了解更多关于我们如何处理你的个人数据的信息，请阅读我们的隐私政策。

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

智启未来，您的人工智能解决方案智库