模型概述
模型特點
模型能力
使用案例
🚀 Mistral-Small-3.1-24B-Instruct-2503 GGUF模型
Mistral-Small-3.1-24B-Instruct-2503 GGUF模型是基於Mistral架構的高性能語言模型,具備多語言支持、視覺理解等先進能力,適用於多種自然語言處理和視覺分析任務。
🚀 快速開始
安裝依賴
若要使用該模型,推薦使用vLLM庫來實現生產就緒的推理管道。確保安裝vLLM >= 0.8.1
:
pip install vllm --upgrade
這將自動安裝mistral_common >= 1.5.4
。你可以通過以下命令進行檢查:
python -c "import mistral_common; print(mistral_common.__version__)"
你也可以使用現成的Docker鏡像或在Docker Hub上獲取。
啟動服務
建議在服務器/客戶端環境中使用Mistral-Small-3.1-24B-Instruct-2503:
- 啟動服務器:
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2
注意:在GPU上運行Mistral-Small-3.1-24B-Instruct-2503需要約55GB的GPU顯存(bf16或fp16)。
- 使用Python代碼向客戶端發送請求:
import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta
url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]
data = {"model": model, "messages": messages, "temperature": 0.15}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Determining the "best" food is highly subjective and depends on personal preferences. However, based on general popularity and recognition, here are some countries known for their cuisine:
# 1. **Italy** - Color: Light Green - City: Milan
# - Italian cuisine is renowned worldwide for its pasta, pizza, and various regional specialties.
# 2. **France** - Color: Brown - City: Lyon
# - French cuisine is celebrated for its sophistication, including dishes like coq au vin, bouillabaisse, and pastries like croissants and éclairs.
# 3. **Spain** - Color: Yellow - City: Bilbao
# - Spanish cuisine offers a variety of flavors, from paella and tapas to jamón ibérico and churros.
# 4. **Greece** - Not visible on the map
# - Greek cuisine is known for dishes like moussaka, souvlaki, and baklava. Unfortunately, Greece is not visible on the provided map, so I cannot name a city.
# Since Greece is not visible on the map, I'll replace it with another country known for its good food:
# 4. **Turkey** - Color: Light Green (east part of the map) - City: Istanbul
# - Turkish cuisine is diverse and includes dishes like kebabs, meze, and baklava.
✨ 主要特性
- 視覺理解:模型具備視覺能力,可分析圖像並基於視覺內容提供見解。
- 多語言支持:支持數十種語言,包括英語、法語、德語、希臘語、印地語、印尼語、意大利語、日語、韓語、馬來語、尼泊爾語、波蘭語、葡萄牙語、羅馬尼亞語、俄語、塞爾維亞語、西班牙語、瑞典語、土耳其語、烏克蘭語、越南語、阿拉伯語、孟加拉語、中文、波斯語。
- 以代理為中心:提供一流的代理能力,支持原生函數調用和JSON輸出。
- 高級推理:具備先進的對話和推理能力。
- 開源許可:採用Apache 2.0許可證,允許商業和非商業用途的使用和修改。
- 長上下文窗口:擁有128k的上下文窗口。
- 系統提示支持:對系統提示有強大的遵循和支持能力。
- 分詞器:使用Tekken分詞器,詞彙表大小為131k。
📦 安裝指南
vLLM安裝
確保安裝vLLM >= 0.8.1
:
pip install vllm --upgrade
安裝後,可通過以下命令檢查mistral_common
版本:
python -c "import mistral_common; print(mistral_common.__version__)"
也可使用Docker鏡像或從Docker Hub獲取。
💻 使用示例
基礎用法
import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta
url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]
data = {"model": model, "messages": messages, "temperature": 0.15}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
高級用法
import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta
url = "http://<your-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. 'San Francisco'",
},
"state": {
"type": "string",
"description": "The state abbreviation, e.g. 'CA' for California",
},
"unit": {
"type": "string",
"description": "The unit for temperature",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["city", "state", "unit"],
},
},
},
{
"type": "function",
"function": {
"name": "rewrite",
"description": "Rewrite a given text for improved clarity",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The input text to rewrite",
}
},
},
},
},
]
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.",
},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "bbc5b7ede",
"type": "function",
"function": {
"name": "rewrite",
"arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}',
},
}
],
},
{
"role": "tool",
"content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}',
"tool_call_id": "bbc5b7ede",
"name": "rewrite",
},
{
"role": "assistant",
"content": "---\n\nOpenAI is a FOR-profit company.",
},
{
"role": "user",
"content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?",
},
]
data = {"model": model, "messages": messages, "tools": tools, "temperature": 0.15}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["tool_calls"])
📚 詳細文檔
模型生成細節
該模型使用llama.cpp在提交版本92ecdcc0
時生成。
超低比特量化(1 - 2比特)
最新的量化方法為超低比特模型(1 - 2比特)引入了精度自適應量化,在Llama-3-8B上經基準測試證明有顯著改進。該方法採用特定層策略,在保持極高內存效率的同時保留準確性。
基準測試上下文
所有測試均在Llama-3-8B-Instruct上進行,使用標準困惑度評估管道、2048令牌上下文窗口和相同的提示集。
量化性能比較(Llama-3-8B)
量化方式 | 標準困惑度 | DynamicGate困惑度 | 困惑度變化 | 標準大小 | DG大小 | 大小變化 | 標準速度 | DG速度 |
---|---|---|---|---|---|---|---|---|
IQ2_XXS | 11.30 | 9.84 | -12.9% | 2.5G | 2.6G | +0.1G | 234s | 246s |
IQ2_XS | 11.72 | 11.63 | -0.8% | 2.7G | 2.8G | +0.1G | 242s | 246s |
IQ2_S | 14.31 | 9.02 | -36.9% | 2.7G | 2.9G | +0.2G | 238s | 244s |
IQ1_M | 27.46 | 15.41 | -43.9% | 2.2G | 2.5G | +0.3G | 206s | 212s |
IQ1_S | 53.07 | 32.00 | -39.7% | 2.1G | 2.4G | +0.3G | 184s | 209s |
模型格式選擇
選擇正確的模型格式取決於硬件能力和內存限制:
模型格式 | 精度 | 內存使用 | 設備要求 | 最佳用例 |
---|---|---|---|---|
BF16 | 最高 | 高 | 支持BF16加速的GPU/CPU | 高速推理且減少內存使用 |
F16 | 高 | 高 | 支持FP16的設備 | 當BF16不可用時的GPU推理 |
Q4_K | 中低 | 低 | CPU或低顯存設備 | 內存受限環境的最佳選擇 |
Q6_K | 中 | 中等 | 內存較多的CPU | 量化模型中準確性較好的選擇 |
Q8_0 | 高 | 中等 | 有足夠顯存的CPU或GPU | 量化模型中準確性最高的選擇 |
IQ3_XS | 極低 | 極低 | 超低內存設備 | 極致內存效率但準確性較低 |
Q4_0 | 低 | 低 | ARM或低內存設備 | llama.cpp可針對ARM設備進行優化 |
包含文件及詳情
文件名 | 詳情 |
---|---|
Mistral-Small-3.1-24B-Instruct-2503-bf16.gguf |
模型權重保存為BF16格式,適用於將模型重新量化為其他格式,設備支持BF16加速時最佳。 |
Mistral-Small-3.1-24B-Instruct-2503-f16.gguf |
模型權重保存為F16格式,設備支持FP16(尤其是BF16不可用時)使用。 |
Mistral-Small-3.1-24B-Instruct-2503-bf16-q8_0.gguf |
輸出和嵌入層保持BF16,其他層量化為Q8_0,設備支持BF16且需要量化版本時使用。 |
Mistral-Small-3.1-24B-Instruct-2503-f16-q8_0.gguf |
輸出和嵌入層保持F16,其他層量化為Q8_0。 |
Mistral-Small-3.1-24B-Instruct-2503-q4_k.gguf |
輸出和嵌入層量化為Q8_0,其他層量化為Q4_K,適合內存有限的CPU推理。 |
Mistral-Small-3.1-24B-Instruct-2503-q4_k_s.gguf |
最小的Q4_K變體,以犧牲準確性為代價減少內存使用,適合極低內存設置。 |
Mistral-Small-3.1-24B-Instruct-2503-q6_k.gguf |
輸出和嵌入層量化為Q8_0,其他層量化為Q6_K。 |
Mistral-Small-3.1-24B-Instruct-2503-q8_0.gguf |
完全Q8量化模型,準確性更高,但需要更多內存。 |
Mistral-Small-3.1-24B-Instruct-2503-iq3_xs.gguf |
IQ3_XS量化,針對極致內存效率優化,適合超低內存設備。 |
Mistral-Small-3.1-24B-Instruct-2503-iq3_m.gguf |
IQ3_M量化,提供中等塊大小以提高準確性,適合低內存設備。 |
Mistral-Small-3.1-24B-Instruct-2503-q4_0.gguf |
純Q4_0量化,針對ARM設備優化,適合低內存環境,若追求更高準確性可選擇IQ4_NL。 |
測試說明
若發現這些模型有用,請點擊“點贊”。可幫助測試AI網絡監控助手,選擇AI助手類型:
TurboLLM
(GPT-4o-mini)HugLLM
(Hugginface開源)TestLLM
(僅支持CPU的實驗性模型)
測試內容
正在測試小型開源模型在AI網絡監控中的極限,包括函數調用、模型大小與任務處理能力等。
示例命令
"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)"
🔧 技術細節
基礎指令模板(V7-Tekken)
<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
<system_prompt>
、<user message>
和<assistant response>
為佔位符。請確保使用mistral-common作為參考。
基準測試結果
預訓練評估
模型 | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT) | MMMU |
---|---|---|---|---|---|
Small 3.1 24B Base | 81.01% | 56.03% | 80.50% | 37.50% | 59.27% |
Gemma 3 27B PT | 78.60% | 52.20% | 81.30% | 24.30% | 56.10% |
指令評估 - 文本
模型 | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT) | MBPP | HumanEval | SimpleQA (TotalAcc) |
---|---|---|---|---|---|---|---|---|
Small 3.1 24B Instruct | 80.62% | 66.76% | 69.30% | 44.42% | 45.96% | 74.71% | 88.41% | 10.43% |
Gemma 3 27B IT | 76.90% | 67.50% | 89.00% | 36.83% | 42.40% | 74.40% | 87.80% | 10.00% |
GPT4o Mini | 82.00% | 61.70% | 70.20% | 40.20% | 39.39% | 84.82% | 87.20% | 9.50% |
Claude 3.5 Haiku | 77.60% | 65.00% | 69.20% | 37.05% | 41.60% | 85.60% | 88.10% | 8.02% |
Cohere Aya-Vision 32B | 72.14% | 47.16% | 41.98% | 34.38% | 33.84% | 70.43% | 62.20% | 7.65% |
指令評估 - 視覺
模型 | MMMU | MMMU PRO | Mathvista | ChartQA | DocVQA | AI2D | MM MT Bench |
---|---|---|---|---|---|---|---|
Small 3.1 24B Instruct | 64.00% | 49.25% | 68.91% | 86.24% | 94.08% | 93.72% | 7.3 |
Gemma 3 27B IT | 64.90% | 48.38% | 67.60% | 76.00% | 86.60% | 84.50% | 7 |
GPT4o Mini | 59.40% | 37.60% | 56.70% | 76.80% | 86.70% | 88.10% | 6.6 |
Claude 3.5 Haiku | 60.50% | 45.03% | 61.60% | 87.20% | 90.00% | 92.10% | 6.5 |
Cohere Aya-Vision 32B | 48.20% | 31.50% | 50.10% | 63.04% | 72.40% | 82.57% | 4.1 |
多語言評估
模型 | 平均得分 | 歐洲語言 | 東亞語言 | 中東語言 |
---|---|---|---|---|
Small 3.1 24B Instruct | 71.18% | 75.30% | 69.17% | 69.08% |
Gemma 3 27B IT | 70.19% | 74.14% | 65.65% | 70.76% |
GPT4o Mini | 70.36% | 74.21% | 65.96% | 70.90% |
Claude 3.5 Haiku | 70.16% | 73.45% | 67.05% | 70.00% |
Cohere Aya-Vision 32B | 62.15% | 64.70% | 57.61% | 64.12% |
長上下文評估
模型 | LongBench v2 | RULER 32K | RULER 128K |
---|---|---|---|
Small 3.1 24B Instruct | 37.18% | 93.96% | 81.20% |
Gemma 3 27B IT | 34.59% | 91.10% | 66.00% |
GPT4o Mini | 29.30% | 90.20% | 65.8% |
Claude 3.5 Haiku | 35.19% | 92.60% | 91.90% |
📄 許可證
該模型採用Apache 2.0許可證。
⚠️ 重要提示
建議使用相對較低的溫度,如
temperature=0.15
。確保為模型添加系統提示以滿足特定需求。
💡 使用建議
若使用Transformers庫,其實現未經過充分測試,僅進行了“氛圍檢查”,因此使用vLLM庫能確保100%正確的行為。



