Mistral-Small-3.1-24B-Instruct-2503-GGUF开源模型 - 指令微调实现智能交互应答

首页

Mistral Small 3.1 24B Instruct 2503 GGUF

由 Mungert 开发

这是一个基于 Mistral-Small-3.1-24B-Base-2503 的指令微调模型，采用 GGUF 格式和 IQ-DynamicGate 超低比特量化技术。

大型语言模型支持多种语言开源协议:Apache-2.0 #超低比特量化 #多语言支持 #边缘设备优化

下载量 10.01k

发布时间 : 3/19/2025

模型简介

该模型是一个多语言大语言模型，支持多种语言，采用先进的量化技术实现高效推理。

模型特点

IQ-DynamicGate 超低比特量化

采用精度自适应量化技术，在1-2比特量化下保持较高准确性

多语言支持

支持24种语言的文本生成

高效推理

通过量化技术实现内存高效利用，适合边缘设备部署

模型能力

多语言文本生成

指令跟随

低资源推理

使用案例

边缘计算

移动端应用

在内存受限的移动设备上部署AI助手

在仅增加0.1-0.3GB内存占用下保持较高准确性

研究

超低比特量化研究

研究1-2比特量化的效果和优化方法

IQ1_M量化困惑度降低43.9%

🚀 Mistral-Small-3.1-24B-Instruct-2503 GGUF模型

Mistral-Small-3.1-24B-Instruct-2503 GGUF模型是基于Mistral架构的高性能语言模型，具备多语言支持、视觉理解等先进能力，适用于多种自然语言处理和视觉分析任务。

🚀 快速开始

安装依赖

若要使用该模型，推荐使用vLLM库来实现生产就绪的推理管道。确保安装vLLM >= 0.8.1：

pip install vllm --upgrade

这将自动安装mistral_common >= 1.5.4。你可以通过以下命令进行检查：

python -c "import mistral_common; print(mistral_common.__version__)"

你也可以使用现成的Docker镜像或在Docker Hub上获取。

启动服务

建议在服务器/客户端环境中使用Mistral-Small-3.1-24B-Instruct-2503：

启动服务器：

vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

注意：在GPU上运行Mistral-Small-3.1-24B-Instruct-2503需要约55GB的GPU显存（bf16或fp16）。

使用Python代码向客户端发送请求：

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

data = {"model": model, "messages": messages, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Determining the "best" food is highly subjective and depends on personal preferences. However, based on general popularity and recognition, here are some countries known for their cuisine:

# 1. **Italy** - Color: Light Green - City: Milan
#    - Italian cuisine is renowned worldwide for its pasta, pizza, and various regional specialties.

# 2. **France** - Color: Brown - City: Lyon
#    - French cuisine is celebrated for its sophistication, including dishes like coq au vin, bouillabaisse, and pastries like croissants and éclairs.

# 3. **Spain** - Color: Yellow - City: Bilbao
#    - Spanish cuisine offers a variety of flavors, from paella and tapas to jamón ibérico and churros.

# 4. **Greece** - Not visible on the map
#    - Greek cuisine is known for dishes like moussaka, souvlaki, and baklava. Unfortunately, Greece is not visible on the provided map, so I cannot name a city.

# Since Greece is not visible on the map, I'll replace it with another country known for its good food:

# 4. **Turkey** - Color: Light Green (east part of the map) - City: Istanbul
#    - Turkish cuisine is diverse and includes dishes like kebabs, meze, and baklava.

✨ 主要特性

视觉理解：模型具备视觉能力，可分析图像并基于视觉内容提供见解。
多语言支持：支持数十种语言，包括英语、法语、德语、希腊语、印地语、印尼语、意大利语、日语、韩语、马来语、尼泊尔语、波兰语、葡萄牙语、罗马尼亚语、俄语、塞尔维亚语、西班牙语、瑞典语、土耳其语、乌克兰语、越南语、阿拉伯语、孟加拉语、中文、波斯语。
以代理为中心：提供一流的代理能力，支持原生函数调用和JSON输出。
高级推理：具备先进的对话和推理能力。
开源许可：采用Apache 2.0许可证，允许商业和非商业用途的使用和修改。
长上下文窗口：拥有128k的上下文窗口。
系统提示支持：对系统提示有强大的遵循和支持能力。
分词器：使用Tekken分词器，词汇表大小为131k。

📦 安装指南

vLLM安装

确保安装vLLM >= 0.8.1：

pip install vllm --upgrade

安装后，可通过以下命令检查mistral_common版本：

python -c "import mistral_common; print(mistral_common.__version__)"

也可使用Docker镜像或从Docker Hub获取。

💻 使用示例

基础用法

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

data = {"model": model, "messages": messages, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

高级用法

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'",
                    },
                    "state": {
                        "type": "string",
                        "description": "The state abbreviation, e.g. 'CA' for California",
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit for temperature",
                        "enum": ["celsius", "fahrenheit"],
                    },
                },
                "required": ["city", "state", "unit"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite",
                    }
                },
            },
        },
    },
]

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.",
    },
    {
        "role": "assistant",
        "content": "",
        "tool_calls": [
            {
                "id": "bbc5b7ede",
                "type": "function",
                "function": {
                    "name": "rewrite",
                    "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}',
                },
            }
        ],
    },
    {
        "role": "tool",
        "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}',
        "tool_call_id": "bbc5b7ede",
        "name": "rewrite",
    },
    {
        "role": "assistant",
        "content": "---\n\nOpenAI is a FOR-profit company.",
    },
    {
        "role": "user",
        "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?",
    },
]

data = {"model": model, "messages": messages, "tools": tools, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["tool_calls"])

📚 详细文档

模型生成细节

该模型使用llama.cpp在提交版本92ecdcc0时生成。

超低比特量化（1 - 2比特）

最新的量化方法为超低比特模型（1 - 2比特）引入了精度自适应量化，在Llama-3-8B上经基准测试证明有显著改进。该方法采用特定层策略，在保持极高内存效率的同时保留准确性。

基准测试上下文

所有测试均在Llama-3-8B-Instruct上进行，使用标准困惑度评估管道、2048令牌上下文窗口和相同的提示集。

量化性能比较（Llama-3-8B）

量化方式	标准困惑度	DynamicGate困惑度	困惑度变化	标准大小	DG大小	大小变化	标准速度	DG速度
IQ2_XXS	11.30	9.84	-12.9%	2.5G	2.6G	+0.1G	234s	246s
IQ2_XS	11.72	11.63	-0.8%	2.7G	2.8G	+0.1G	242s	246s
IQ2_S	14.31	9.02	-36.9%	2.7G	2.9G	+0.2G	238s	244s
IQ1_M	27.46	15.41	-43.9%	2.2G	2.5G	+0.3G	206s	212s
IQ1_S	53.07	32.00	-39.7%	2.1G	2.4G	+0.3G	184s	209s

模型格式选择

选择正确的模型格式取决于硬件能力和内存限制：

模型格式	精度	内存使用	设备要求	最佳用例
BF16	最高	高	支持BF16加速的GPU/CPU	高速推理且减少内存使用
F16	高	高	支持FP16的设备	当BF16不可用时的GPU推理
Q4_K	中低	低	CPU或低显存设备	内存受限环境的最佳选择
Q6_K	中	中等	内存较多的CPU	量化模型中准确性较好的选择
Q8_0	高	中等	有足够显存的CPU或GPU	量化模型中准确性最高的选择
IQ3_XS	极低	极低	超低内存设备	极致内存效率但准确性较低
Q4_0	低	低	ARM或低内存设备	llama.cpp可针对ARM设备进行优化

包含文件及详情

文件名	详情
`Mistral-Small-3.1-24B-Instruct-2503-bf16.gguf`	模型权重保存为BF16格式，适用于将模型重新量化为其他格式，设备支持BF16加速时最佳。
`Mistral-Small-3.1-24B-Instruct-2503-f16.gguf`	模型权重保存为F16格式，设备支持FP16（尤其是BF16不可用时）使用。
`Mistral-Small-3.1-24B-Instruct-2503-bf16-q8_0.gguf`	输出和嵌入层保持BF16，其他层量化为Q8_0，设备支持BF16且需要量化版本时使用。
`Mistral-Small-3.1-24B-Instruct-2503-f16-q8_0.gguf`	输出和嵌入层保持F16，其他层量化为Q8_0。
`Mistral-Small-3.1-24B-Instruct-2503-q4_k.gguf`	输出和嵌入层量化为Q8_0，其他层量化为Q4_K，适合内存有限的CPU推理。
`Mistral-Small-3.1-24B-Instruct-2503-q4_k_s.gguf`	最小的Q4_K变体，以牺牲准确性为代价减少内存使用，适合极低内存设置。
`Mistral-Small-3.1-24B-Instruct-2503-q6_k.gguf`	输出和嵌入层量化为Q8_0，其他层量化为Q6_K。
`Mistral-Small-3.1-24B-Instruct-2503-q8_0.gguf`	完全Q8量化模型，准确性更高，但需要更多内存。
`Mistral-Small-3.1-24B-Instruct-2503-iq3_xs.gguf`	IQ3_XS量化，针对极致内存效率优化，适合超低内存设备。
`Mistral-Small-3.1-24B-Instruct-2503-iq3_m.gguf`	IQ3_M量化，提供中等块大小以提高准确性，适合低内存设备。
`Mistral-Small-3.1-24B-Instruct-2503-q4_0.gguf`	纯Q4_0量化，针对ARM设备优化，适合低内存环境，若追求更高准确性可选择IQ4_NL。

测试说明

若发现这些模型有用，请点击“点赞”。可帮助测试AI网络监控助手，选择AI助手类型：

TurboLLM (GPT-4o-mini)
HugLLM (Hugginface开源)
TestLLM (仅支持CPU的实验性模型)

测试内容

正在测试小型开源模型在AI网络监控中的极限，包括函数调用、模型大小与任务处理能力等。

示例命令

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)"

🔧 技术细节

基础指令模板（V7-Tekken）

<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]

<system_prompt>、<user message>和<assistant response>为占位符。请确保使用mistral-common作为参考。

基准测试结果

预训练评估

模型	MMLU (5-shot)	MMLU Pro (5-shot CoT)	TriviaQA	GPQA Main (5-shot CoT)	MMMU
Small 3.1 24B Base	81.01%	56.03%	80.50%	37.50%	59.27%
Gemma 3 27B PT	78.60%	52.20%	81.30%	24.30%	56.10%

指令评估 - 文本

模型	MMLU	MMLU Pro (5-shot CoT)	MATH	GPQA Main (5-shot CoT)	GPQA Diamond (5-shot CoT)	MBPP	HumanEval	SimpleQA (TotalAcc)
Small 3.1 24B Instruct	80.62%	66.76%	69.30%	44.42%	45.96%	74.71%	88.41%	10.43%
Gemma 3 27B IT	76.90%	67.50%	89.00%	36.83%	42.40%	74.40%	87.80%	10.00%
GPT4o Mini	82.00%	61.70%	70.20%	40.20%	39.39%	84.82%	87.20%	9.50%
Claude 3.5 Haiku	77.60%	65.00%	69.20%	37.05%	41.60%	85.60%	88.10%	8.02%
Cohere Aya-Vision 32B	72.14%	47.16%	41.98%	34.38%	33.84%	70.43%	62.20%	7.65%

指令评估 - 视觉

模型	MMMU	MMMU PRO	Mathvista	ChartQA	DocVQA	AI2D	MM MT Bench
Small 3.1 24B Instruct	64.00%	49.25%	68.91%	86.24%	94.08%	93.72%	7.3
Gemma 3 27B IT	64.90%	48.38%	67.60%	76.00%	86.60%	84.50%	7
GPT4o Mini	59.40%	37.60%	56.70%	76.80%	86.70%	88.10%	6.6
Claude 3.5 Haiku	60.50%	45.03%	61.60%	87.20%	90.00%	92.10%	6.5
Cohere Aya-Vision 32B	48.20%	31.50%	50.10%	63.04%	72.40%	82.57%	4.1

多语言评估

模型	平均得分	欧洲语言	东亚语言	中东语言
Small 3.1 24B Instruct	71.18%	75.30%	69.17%	69.08%
Gemma 3 27B IT	70.19%	74.14%	65.65%	70.76%
GPT4o Mini	70.36%	74.21%	65.96%	70.90%
Claude 3.5 Haiku	70.16%	73.45%	67.05%	70.00%
Cohere Aya-Vision 32B	62.15%	64.70%	57.61%	64.12%