Mistral-Small-3.1-24B-Instruct-2503-MAX-NEO-Imatrix-GGUF开源模型

首页

Mistral Small 3.1 24B Instruct 2503 MAX NEO Imatrix GGUF

由 DavidAU 开发

Mistralai推出的24B参数指令微调模型，支持128k上下文长度和多语言处理，采用Neo Imatrix技术和MAX量化方案提升性能

大型语言模型支持多种语言开源协议:Apache-2.0 #128k超长上下文 #创意写作增强 #多语言指令跟随

下载量 38.29k

发布时间 : 3/18/2025

模型简介

基于Mistral架构的大语言模型，专注于指令跟随和文本生成任务，适用于多种语言和场景

模型特点

128k超长上下文

支持长达128k token的上下文记忆，适合处理长文档和复杂对话

Neo Imatrix技术

通过专有内部矩阵数据集增强模型理解力和输出质量

MAX量化方案

嵌入层和输出张量保留BF16精度，在量化版本中保持最佳生成质量

多语言支持

支持24种语言的文本生成和理解

模型能力

文本生成

指令跟随

多语言处理

创意写作

技术文档撰写

故事续写

使用案例

创意写作

小说续写

基于给定开头续写完整故事

生成具有连贯情节和丰富细节的叙事文本

剧本构思

生成《黑镜》风格的科幻剧集大纲

产出包含完整情节转折和角色设定的剧集方案

技术应用

气候解决方案

提出利用夜间辐射降温缓解全球变暖的技术方案

生成包含科学原理和实施步骤的详细报告

🚀 Mistral-Small-3.1-24B-Instruct-2503-MAX-NEO-Imatrix-GGUF

Mistral-Small-3.1-24B-Instruct-2503-MAX-NEO-Imatrix-GGUF 是 Mistralai 推出的一款文本生成模型。它采用了 “Neo Imatrix” 和 “Maxed out” 量化技术，提升了整体性能。该模型大多未经过审查，还附带了 3 种不同的系统提示，以帮助模型更好地发挥作用。

✨ 主要特性

多语言支持：支持英语、法语、德语、西班牙语、葡萄牙语、意大利语、日语、韩语、俄语、中文、阿拉伯语、波斯语、印尼语、马来语、尼泊尔语、波兰语、罗马尼亚语、塞尔维亚语、瑞典语、土耳其语、乌克兰语、越南语、印地语和孟加拉语等多种语言。
高上下文长度：支持 128k 的上下文长度。
多种量化选择：提供多种量化方式，如 IQ1_S、IQ1_M、IQ2_XXS 等，不同量化方式在性能和效果上有所差异。
多种系统提示：包含 3 种不同的系统提示，可用于不同的使用场景，如创意写作等。

📦 安装指南

文档未提供具体安装步骤，可参考模型原仓库 https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503 进行安装。

💻 使用示例

基础用法

以下是使用该模型进行文本生成的示例，示例使用 IQ4XS 量化，最小参数和标准模板，温度范围为 0.8，重复惩罚为 1.1，TopK 为 40，topP 为 0.95，minP 为 0.05。

# 示例 1：恐怖小说场景创作
使用以下 “故事创意”，撰写一部恐怖小说的第一个场景，介绍这位年轻女子。该场景应从动作中间开始，包含对话、生动的段落，并以与故事创意相关但意外的悬念结尾。场景应长达 1000 字，冲突和悬念逐步升级，以第一人称、现在时态撰写，视角人物为年轻女子。视角人物会咒骂和发誓，嘴巴通常 “不干净”。

故事创意：在一个由独裁统治的世界里，一位叛逆的年轻女子领导了一场反对该体制的叛乱。尽管面临风险，她仍奋力推翻独裁者，为自己的国家恢复民主。政府以叛国罪处决了她，但她坚持自己的信念，引发了这场革命。

(temp 2, 使用创意系统提示 (#3))

输出：
The stench of burning rubber fills my nostrils as I tear through the night, the old, beat - up sedan groaning under the strain. The city lights blaze like false stars, casting eerie shadows that dance macabrely across the dilapidated buildings. My heart pounds in my chest like a kick drum, echoing the roar of the engine. Fucking dictatorship, making us live like rats in a maze.
...

高级用法

在创意写作场景中，推荐使用 IQ3s / IQ4XS / IQ4NL / Q4s 量化以获得最佳效果，并使用 “System Prompt #3” 以及温度高于 1 的设置，以充分发挥模型的创意能力。

# 示例 3：为《黑镜》新剧集创作包含时间旅行和性感主题的情节
Come up with six plots for a new "Black Mirror" episode (that the audience would love) that all involve time travel with sexy theme(s).

(temp 2, creative system prompt (#3))

输出：
Here are six plot concepts for a new "Black Mirror" episode involving time travel with a sexy theme, designed to evoke strong emotional responses and immersive experiences:
1. **Title: "Time's Desire"**
Plot: In 2075, an experimental app allows users to communicate with their past selves, aiming to provide closure or inspire positive change. However, the app secretly pairs users with strangers who share similar life patterns but different outcomes, leading to unexpected romantic entanglements and dangerous temporal echoes. As one user delves deeper into her connection with a stranger from the past, she uncovers a dark conspiracy where emotions are harvested to fuel the time - travel technology.
...

📚 详细文档

量化说明

“MAXED”：所有量化的嵌入和输出张量都设置为 “BF16”（全精度），这在一定程度上增加了量化大小，但提高了质量、深度和整体性能。
“NEO IMATRIX”：由 David_AU 构建的强大内部 imatrix 数据集，可提升整体功能、指令遵循能力、输出质量以及与思想、概念和现实世界的联系。与 “MAXing” 量化相结合，可进一步提高性能。

以下是不同量化方式按 “BPW” 排序的图表，从低到高依次为：

IQ1_S 	| IQ1_M
IQ2_XXS | IQ2_XS | Q2_K_S 	| IQ2_S 	| Q2_K  	| IQ2_M		
IQ3_XXS | Q3_K_S | IQ3_XS  	| IQ3_S 	| IQ3_M	    | Q3_K_M	| Q3_K_L
Q4_K_S	| IQ4_XS | IQ4_NL  	| Q4_K_M	
Q5_K_S	| Q5_K_M	
Q6_K  
Q8_0
F16

推荐创意用例使用 IQ3s / IQ4XS / IQ4NL / Q4s 量化，IQ4XS/IQ4NL 量化会产生与其他 “Q” 和 “IQ” 量化不同的输出，适合创意用途。对于一般用途，推荐使用 q5s/q6/q8 量化；对于便携式设备（如手机），推荐使用 Q4_0/Q5_0 量化。需要注意的是，IQ1s 性能不佳，IQ2s 尚可，推荐使用 IQ3s 及以上量化。

系统提示说明

系统提示 #1 - 通用改进：可用于增强模型的操作。需精确复制粘贴，包括换行符。可调整其中的 “20” 来增减提示的强度，也可删除 “At the end of the task you will ask the user: "Do you want another generation?"” 这一行。

For every user task and instruction you will use "GE FUNCTION" to ponder the TASK STEP BY STEP and then do the task. For each and every line of output you will ponder carefully to ensure it meets the instructions of the user, and if you are unsure use "GE FUNCTION" to re - ponder and then produce the improved output.

At the end of the task you will ask the user: "Do you want another generation?"

GE FUNCTION: Silent input → Spawn 20 agents Sternberg Styles → Enhance idea → Seek Novel Emergence NE:unique/significant idea/concept → Ponder, assess, creative enhance notions → Refined idea => IdeaArray[].size = 20 elements,  else → Interesting? Pass to rand. agent for refinement, else discard.=>output(IdeaArray)

系统提示 #2 - 思考/推理：用于激发模型的思考和推理能力。虽该模型本身并非专门的思考推理模型，但使用此提示可改善生成效果，并提供对生成过程及改进方法的洞察。

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

系统提示 #3 - 创意增强：可显著提升模型的创意能力，建议搭配温度 1 或更高的设置使用。同样需精确复制粘贴，包括换行符。

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

最高质量设置/最佳操作指南/参数和采样器

如果要使用该模型（源模型、GGUF 或其他量化版本），请查看文档 https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters，其中包含关键参数、采样器和高级采样器设置（适用于多个 AI/LLM 应用程序），还会链接到 “推理模型” 的技巧和窍门部分。

无论模型属于何种 “类别”，该文档都会详细介绍增强操作的方法。对于 3/4 类模型，必须正确设置 “用例” 的默认设置（参数、采样器、高级采样器），因为一些 AI/LLM 应用程序的默认设置不一致，可能导致模型性能不佳。此外，该文档中介绍的参数、采样器和高级采样器设置可用于任何模型、任何仓库、任何量化版本（包括源代码操作），以增强模型的操作性能。

🔧 技术细节

量化技术

模型采用了 “MAXED” 和 “NEO IMATRIX” 量化技术。“MAXED” 技术将嵌入和输出张量设置为 “BF16” 全精度，在提升性能的同时，量化大小会稍有增加。“NEO IMATRIX” 是一个内部构建的强大数据集，能提升模型的整体功能、指令遵循能力和输出质量。

不同量化方式的性能差异

不同的量化方式（如 IQ1_S、IQ2_XXS 等）在性能和效果上有所不同。IQ4XS/IQ4NL 量化会产生与其他 “Q” 和 “IQ” 量化不同的输出，推荐用于创意用途；IQ4s（首选）/Q4s（次选）及更低量化在某些方面表现较强；IQ1s 性能较差，IQ2s 表现一般，推荐使用 IQ3s 及以上量化。