Mistral-Small-3.1-24B-Instruct-2503-GGUF開源模型 - 指令微調實現智能交互應答

首頁

Mistral Small 3.1 24B Instruct 2503 GGUF

由Mungert開發

這是一個基於 Mistral-Small-3.1-24B-Base-2503 的指令微調模型，採用 GGUF 格式和 IQ-DynamicGate 超低比特量化技術。

大型語言模型支持多種語言開源協議:Apache-2.0 #超低比特量化 #多語言支持 #邊緣設備優化

下載量 10.01k

發布時間 : 3/19/2025

模型概述

該模型是一個多語言大語言模型，支持多種語言，採用先進的量化技術實現高效推理。

模型特點

IQ-DynamicGate 超低比特量化

採用精度自適應量化技術，在1-2比特量化下保持較高準確性

多語言支持

支持24種語言的文本生成

高效推理

通過量化技術實現內存高效利用，適合邊緣設備部署

模型能力

多語言文本生成

指令跟隨

低資源推理

使用案例

邊緣計算

移動端應用

在內存受限的移動設備上部署AI助手

在僅增加0.1-0.3GB內存佔用下保持較高準確性

研究

超低比特量化研究

研究1-2比特量化的效果和優化方法

IQ1_M量化困惑度降低43.9%

🚀 Mistral-Small-3.1-24B-Instruct-2503 GGUF模型

Mistral-Small-3.1-24B-Instruct-2503 GGUF模型是基於Mistral架構的高性能語言模型，具備多語言支持、視覺理解等先進能力，適用於多種自然語言處理和視覺分析任務。

🚀 快速開始

安裝依賴

若要使用該模型，推薦使用vLLM庫來實現生產就緒的推理管道。確保安裝vLLM >= 0.8.1：

pip install vllm --upgrade

這將自動安裝mistral_common >= 1.5.4。你可以通過以下命令進行檢查：

python -c "import mistral_common; print(mistral_common.__version__)"

你也可以使用現成的Docker鏡像或在Docker Hub上獲取。

啟動服務

建議在服務器/客戶端環境中使用Mistral-Small-3.1-24B-Instruct-2503：

啟動服務器：

vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

注意：在GPU上運行Mistral-Small-3.1-24B-Instruct-2503需要約55GB的GPU顯存（bf16或fp16）。

使用Python代碼向客戶端發送請求：

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

data = {"model": model, "messages": messages, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Determining the "best" food is highly subjective and depends on personal preferences. However, based on general popularity and recognition, here are some countries known for their cuisine:

# 1. **Italy** - Color: Light Green - City: Milan
#    - Italian cuisine is renowned worldwide for its pasta, pizza, and various regional specialties.

# 2. **France** - Color: Brown - City: Lyon
#    - French cuisine is celebrated for its sophistication, including dishes like coq au vin, bouillabaisse, and pastries like croissants and éclairs.

# 3. **Spain** - Color: Yellow - City: Bilbao
#    - Spanish cuisine offers a variety of flavors, from paella and tapas to jamón ibérico and churros.

# 4. **Greece** - Not visible on the map
#    - Greek cuisine is known for dishes like moussaka, souvlaki, and baklava. Unfortunately, Greece is not visible on the provided map, so I cannot name a city.

# Since Greece is not visible on the map, I'll replace it with another country known for its good food:

# 4. **Turkey** - Color: Light Green (east part of the map) - City: Istanbul
#    - Turkish cuisine is diverse and includes dishes like kebabs, meze, and baklava.

✨ 主要特性

視覺理解：模型具備視覺能力，可分析圖像並基於視覺內容提供見解。
多語言支持：支持數十種語言，包括英語、法語、德語、希臘語、印地語、印尼語、意大利語、日語、韓語、馬來語、尼泊爾語、波蘭語、葡萄牙語、羅馬尼亞語、俄語、塞爾維亞語、西班牙語、瑞典語、土耳其語、烏克蘭語、越南語、阿拉伯語、孟加拉語、中文、波斯語。
以代理為中心：提供一流的代理能力，支持原生函數調用和JSON輸出。
高級推理：具備先進的對話和推理能力。
開源許可：採用Apache 2.0許可證，允許商業和非商業用途的使用和修改。
長上下文窗口：擁有128k的上下文窗口。
系統提示支持：對系統提示有強大的遵循和支持能力。
分詞器：使用Tekken分詞器，詞彙表大小為131k。

📦 安裝指南

vLLM安裝

確保安裝vLLM >= 0.8.1：

pip install vllm --upgrade

安裝後，可通過以下命令檢查mistral_common版本：

python -c "import mistral_common; print(mistral_common.__version__)"

也可使用Docker鏡像或從Docker Hub獲取。

💻 使用示例

基礎用法

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

data = {"model": model, "messages": messages, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

高級用法

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'",
                    },
                    "state": {
                        "type": "string",
                        "description": "The state abbreviation, e.g. 'CA' for California",
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit for temperature",
                        "enum": ["celsius", "fahrenheit"],
                    },
                },
                "required": ["city", "state", "unit"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite",
                    }
                },
            },
        },
    },
]

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.",
    },
    {
        "role": "assistant",
        "content": "",
        "tool_calls": [
            {
                "id": "bbc5b7ede",
                "type": "function",
                "function": {
                    "name": "rewrite",
                    "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}',
                },
            }
        ],
    },
    {
        "role": "tool",
        "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}',
        "tool_call_id": "bbc5b7ede",
        "name": "rewrite",
    },
    {
        "role": "assistant",
        "content": "---\n\nOpenAI is a FOR-profit company.",
    },
    {
        "role": "user",
        "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?",
    },
]

data = {"model": model, "messages": messages, "tools": tools, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["tool_calls"])

📚 詳細文檔

模型生成細節

該模型使用llama.cpp在提交版本92ecdcc0時生成。

超低比特量化（1 - 2比特）

最新的量化方法為超低比特模型（1 - 2比特）引入了精度自適應量化，在Llama-3-8B上經基準測試證明有顯著改進。該方法採用特定層策略，在保持極高內存效率的同時保留準確性。

基準測試上下文

所有測試均在Llama-3-8B-Instruct上進行，使用標準困惑度評估管道、2048令牌上下文窗口和相同的提示集。

量化性能比較（Llama-3-8B）

量化方式	標準困惑度	DynamicGate困惑度	困惑度變化	標準大小	DG大小	大小變化	標準速度	DG速度
IQ2_XXS	11.30	9.84	-12.9%	2.5G	2.6G	+0.1G	234s	246s
IQ2_XS	11.72	11.63	-0.8%	2.7G	2.8G	+0.1G	242s	246s
IQ2_S	14.31	9.02	-36.9%	2.7G	2.9G	+0.2G	238s	244s
IQ1_M	27.46	15.41	-43.9%	2.2G	2.5G	+0.3G	206s	212s
IQ1_S	53.07	32.00	-39.7%	2.1G	2.4G	+0.3G	184s	209s

模型格式選擇

選擇正確的模型格式取決於硬件能力和內存限制：

模型格式	精度	內存使用	設備要求	最佳用例
BF16	最高	高	支持BF16加速的GPU/CPU	高速推理且減少內存使用
F16	高	高	支持FP16的設備	當BF16不可用時的GPU推理
Q4_K	中低	低	CPU或低顯存設備	內存受限環境的最佳選擇
Q6_K	中	中等	內存較多的CPU	量化模型中準確性較好的選擇
Q8_0	高	中等	有足夠顯存的CPU或GPU	量化模型中準確性最高的選擇
IQ3_XS	極低	極低	超低內存設備	極致內存效率但準確性較低
Q4_0	低	低	ARM或低內存設備	llama.cpp可針對ARM設備進行優化

包含文件及詳情

文件名	詳情
`Mistral-Small-3.1-24B-Instruct-2503-bf16.gguf`	模型權重保存為BF16格式，適用於將模型重新量化為其他格式，設備支持BF16加速時最佳。
`Mistral-Small-3.1-24B-Instruct-2503-f16.gguf`	模型權重保存為F16格式，設備支持FP16（尤其是BF16不可用時）使用。
`Mistral-Small-3.1-24B-Instruct-2503-bf16-q8_0.gguf`	輸出和嵌入層保持BF16，其他層量化為Q8_0，設備支持BF16且需要量化版本時使用。
`Mistral-Small-3.1-24B-Instruct-2503-f16-q8_0.gguf`	輸出和嵌入層保持F16，其他層量化為Q8_0。
`Mistral-Small-3.1-24B-Instruct-2503-q4_k.gguf`	輸出和嵌入層量化為Q8_0，其他層量化為Q4_K，適合內存有限的CPU推理。
`Mistral-Small-3.1-24B-Instruct-2503-q4_k_s.gguf`	最小的Q4_K變體，以犧牲準確性為代價減少內存使用，適合極低內存設置。
`Mistral-Small-3.1-24B-Instruct-2503-q6_k.gguf`	輸出和嵌入層量化為Q8_0，其他層量化為Q6_K。
`Mistral-Small-3.1-24B-Instruct-2503-q8_0.gguf`	完全Q8量化模型，準確性更高，但需要更多內存。
`Mistral-Small-3.1-24B-Instruct-2503-iq3_xs.gguf`	IQ3_XS量化，針對極致內存效率優化，適合超低內存設備。
`Mistral-Small-3.1-24B-Instruct-2503-iq3_m.gguf`	IQ3_M量化，提供中等塊大小以提高準確性，適合低內存設備。
`Mistral-Small-3.1-24B-Instruct-2503-q4_0.gguf`	純Q4_0量化，針對ARM設備優化，適合低內存環境，若追求更高準確性可選擇IQ4_NL。

測試說明

若發現這些模型有用，請點擊“點贊”。可幫助測試AI網絡監控助手，選擇AI助手類型：

TurboLLM (GPT-4o-mini)
HugLLM (Hugginface開源)
TestLLM (僅支持CPU的實驗性模型)

測試內容

正在測試小型開源模型在AI網絡監控中的極限，包括函數調用、模型大小與任務處理能力等。

示例命令

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)"

🔧 技術細節

基礎指令模板（V7-Tekken）

<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]

<system_prompt>、<user message>和<assistant response>為佔位符。請確保使用mistral-common作為參考。

基準測試結果

預訓練評估

模型	MMLU (5-shot)	MMLU Pro (5-shot CoT)	TriviaQA	GPQA Main (5-shot CoT)	MMMU
Small 3.1 24B Base	81.01%	56.03%	80.50%	37.50%	59.27%
Gemma 3 27B PT	78.60%	52.20%	81.30%	24.30%	56.10%

指令評估 - 文本

模型	MMLU	MMLU Pro (5-shot CoT)	MATH	GPQA Main (5-shot CoT)	GPQA Diamond (5-shot CoT)	MBPP	HumanEval	SimpleQA (TotalAcc)
Small 3.1 24B Instruct	80.62%	66.76%	69.30%	44.42%	45.96%	74.71%	88.41%	10.43%
Gemma 3 27B IT	76.90%	67.50%	89.00%	36.83%	42.40%	74.40%	87.80%	10.00%
GPT4o Mini	82.00%	61.70%	70.20%	40.20%	39.39%	84.82%	87.20%	9.50%
Claude 3.5 Haiku	77.60%	65.00%	69.20%	37.05%	41.60%	85.60%	88.10%	8.02%
Cohere Aya-Vision 32B	72.14%	47.16%	41.98%	34.38%	33.84%	70.43%	62.20%	7.65%

指令評估 - 視覺

模型	MMMU	MMMU PRO	Mathvista	ChartQA	DocVQA	AI2D	MM MT Bench
Small 3.1 24B Instruct	64.00%	49.25%	68.91%	86.24%	94.08%	93.72%	7.3
Gemma 3 27B IT	64.90%	48.38%	67.60%	76.00%	86.60%	84.50%	7
GPT4o Mini	59.40%	37.60%	56.70%	76.80%	86.70%	88.10%	6.6
Claude 3.5 Haiku	60.50%	45.03%	61.60%	87.20%	90.00%	92.10%	6.5
Cohere Aya-Vision 32B	48.20%	31.50%	50.10%	63.04%	72.40%	82.57%	4.1

多語言評估

模型	平均得分	歐洲語言	東亞語言	中東語言
Small 3.1 24B Instruct	71.18%	75.30%	69.17%	69.08%
Gemma 3 27B IT	70.19%	74.14%	65.65%	70.76%
GPT4o Mini	70.36%	74.21%	65.96%	70.90%
Claude 3.5 Haiku	70.16%	73.45%	67.05%	70.00%
Cohere Aya-Vision 32B	62.15%	64.70%	57.61%	64.12%