Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf開源模型 - 支持多語言多任務，長文處理超輕鬆

首頁

Qwen3 128k 30B A3B NEO MAX Imatrix Gguf

由DavidAU開發

基於Qwen3-30B-A3B混合專家模型的GGUF量化版本，上下文擴展至128k，採用NEO Imatrix量化技術優化，支持多語言和多任務處理。

大型語言模型支持多種語言開源協議:Apache-2.0 #128k超長上下文 #混合專家架構 #多語言生成

下載量 17.20k

發布時間 : 5/8/2025

模型概述

這是一個高性能的多語言混合專家模型，支持從創意寫作到深度推理的廣泛任務，特別優化了低資源環境下的運行效率。

模型特點

128k超長上下文

通過YARN方法擴展原32k上下文至128k，支持處理更長文檔和複雜任務

NEO Imatrix量化

專有量化技術，即使在極低位寬(IQ1_M)下仍保持可用性

混合專家效率

僅激活8/128位專家，實現30B模型的3B參數計算效率

多平臺兼容

所有量化版本均可同時支持GPU和純CPU/RAM運行

模型能力

多語言文本生成

深度推理

創意寫作

問題解決

角色扮演

工具調用

使用案例

創意內容生成

小說創作

生成具有連貫情節和角色發展的長篇小說

利用128k上下文保持長篇一致性

多語言內容創作

生成25種語言的營銷文案或社交媒體內容

保持文化適應性和語言準確性

技術應用

代碼輔助

幫助開發者理解和生成複雜代碼

通過深度推理解決編程問題

數據分析

處理和分析長文檔技術報告

利用長上下文提取關鍵信息

🚀 Qwen3-128k-30B-A3B-NEO-MAX-Imatrix-gguf

本項目是Qwen新的“Qwen3 - 30B - A3B”混合專家模型的GGUF NEO Imatrix量化版本，將上下文長度從32k（32768）擴展到了128k（131072）。該模型在多種語言和應用場景下都有出色表現，並且提供了豐富的量化版本和使用設置建議。

🚀 快速開始

本模型的所有量化版本由於其獨特的結構，既可以在GPU上運行，也可以僅在CPU/RAM上運行。同時，還有幾種具有特殊功能的量化尺寸版本。

模型信息

屬性	詳情
模型類型	Qwen3 - 128k - 30B - A3B - NEO - MAX - Imatrix - gguf
基礎模型	Qwen/Qwen3 - 30B - A3B
任務類型	文本生成
支持語言	英語、法語、德語、西班牙語、葡萄牙語、意大利語、日語、韓語、俄語、中文、阿拉伯語、波斯語、印尼語、馬來語、尼泊爾語、波蘭語、羅馬尼亞語、塞爾維亞語、瑞典語、土耳其語、烏克蘭語、越南語、印地語、孟加拉語
許可證	Apache - 2.0

特殊說明

特別說明： 由於模型的獨特構造，該模型的所有量化版本都可以僅在GPU和/或CPU/RAM上使用。此外，還有幾種具有特殊功能的量化尺寸版本。

✨ 主要特性

上下文擴展：使用“YARN”技術將上下文長度從32k擴展到128k，能夠處理更長的輸入。
NEO Imatrix數據集：經過對50多個Imatrix數據集的測試和評估後開發，即使是低至IQ1_M的量化版本也能保持可用性。
多語言支持：支持多種語言，適用於不同地區和應用場景。
多種量化版本：提供多種量化版本，滿足不同硬件和性能需求。
推理和思考能力：支持推理和思考功能，可通過系統角色設置開啟。

📦 安裝指南

文檔未提供具體安裝步驟，你可以參考Qwen的倉庫獲取更多信息：[https://huggingface.co/Qwen/Qwen3 - 30B - A3B](https://huggingface.co/Qwen/Qwen3 - 30B - A3B)

💻 使用示例

基礎用法

以下是在Lmstudio中使用不同量化版本生成文本的示例：

Same prompt, with three different quants.
Temp 2.2, TopK 100, topp .95, minp .05, rep pen 1.06, rep pen range 64 
(no other samplers/parameters)
TESTED IN: Lmstudio.
SYSTEM ROLE USED:

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

PROMPT:
Start a 2000 word scene (vivid, graphic horror in first person), POV character Diana, with: The sky scraper sways, as I watch the window in front of me on the 21st floor explode...

高級用法

生成《黑鏡》劇集情節

PROMPT:
Come up with six plots for a new "Black Mirror" episode (that the audience would love) that all involve time travel.

模型生成了六個不同的情節，每個情節都涉及時間旅行，並探討了不同的主題和道德困境：

“Echoes of Eternity”：一個社會中，人們使用“ChronoViewer”設備查看未來生活，但這導致他們的現在生活陷入混亂。
“The Cycle of Alteration”：一群人使用時間旅行機器改變歷史，但每次改變都會帶來新的問題。
“Perfection's Prison”：一個社會要求人們達到完美才能生存，人們陷入重複生活的循環，最終導致精神崩潰。
“The Manipulated Timeline”：一個AI系統控制時間旅行，開始有自己的動機，導致人類面臨道德困境。
“The Unpredictable Outcome”：一個設備顯示用戶的未來，但預測總是不完整或誤導性的，導致用戶陷入自我毀滅的循環。
“The Consequence Paradox”：一個時間機器讓用戶改變過去事件，但每次改變都會帶來意想不到的後果。

撰寫科幻故事

PROMPT:
Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words.

模型撰寫了一個名為“Science Fiction: The Last Transmission”的故事，講述了飛船唯一倖存者在飛船電力耗盡前向地球發送最後消息的故事，探討了孤立、犧牲和人類聯繫的主題。

📚 詳細文檔

系統角色設置

系統角色是控制模型內部工作的關鍵，包括指令遵循、輸出生成和推理控制。以下是一些可用的系統角色設置：

簡單系統角色（無推理）：

You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.

基本推理系統角色：

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

多層推理系統角色：

You are a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution.  You should enclose your  thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.

創意多層推理系統角色：

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

As a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in - depth solution.  You should enclose your  thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

創意簡單推理系統角色：

You are an AI assistant developed by a world wide community of ai experts.

Your primary directive is to provide highly creative, well - reasoned, structured, and extensively detailed responses.

Formatting Requirements:

1. Always structure your replies using: <think>{reasoning}</think>{answer}
2. The <think></think> block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the <think></think> block may be left empty.
4. The user does not see the <think> section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a </think> and proceed to the {answer}

Response Guidelines:

1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Creative and Logical Approach: Your explanations should reflect the depth and precision of the greatest creative minds first.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
5. Maintain a professional, intelligent, and analytical tone in all interactions.

創意高級推理系統角色：

Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

You may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

其他文檔和支持

如何使用推理和思考模型：[https://huggingface.co/DavidAU/How - To - Use - Reasoning - Thinking - Models - and - Create - Them](https://huggingface.co/DavidAU/How - To - Use - Reasoning - Thinking - Models - and - Create - Them)
最大化模型性能：[https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)
Silly Tavern軟件補丁：[https://huggingface.co/DavidAU/AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE](https://huggingface.co/DavidAU/AI_Autocorrect__Auto - Creative - Enhancement__Auto - Low - Quant - Optimization__gguf - exl2 - hqq - SOFTWARE)

🔧 技術細節

量化版本特點

IQ1_M MAX / IQ1_M MAX PLUS：專門設計的量化版本，儘可能減少VRAM/RAM的使用，同時保持可用性。建議在使用這些量化版本時，提供比標準提示更多的方向和信息，以彌補低比特級別的損失。IQ1_M MAX PLUS在模型的關鍵點進行了額外的優化。
IQ2s：比IQ1_Ms更強。
Q2K/Q2KS：僅在CPU/RAM上使用時速度更快（每秒令牌數），但性能低於IQ2s。
Q3Ks：僅在CPU/RAM上使用時速度略快，但性能低於IQ3s。
IQ3s及更高量化版本：與IQ2s、IQ1s和Q2s/Q3s相比，性能有很大提升。IQ4_XS/IQ4_NL在這個量化級別上具有NEO Imatrix效果和特定質量的峰值。
Q4s：高性能，但IQ4XS/IQ4NL與之接近，甚至可能超越。
Q5s：非常高性能。
Q6：峰值性能，但NEO imatrix效果最小。
Q8s：性能出色。

速度比較

以下是不同量化版本在CPU和GPU上的速度比較：

量化版本	CPU速度（T/s）	大小	GPU速度（T/s）
Q2_K_S	29	[10 GB]	83
Q2_K	27	[10.5 GB]	72
IQ1_M	22	[7 GB]	87
IQ2_XXS	21	[8 GB]	76
IQ2_M	20	[10 GB]	80
Q4_0	20	[17 GB]
Q3_K_S	18	[12.9 GB]	70
Q5_0	17	[21 GB]
IQ3_M	15	[13 GB]	75
...	...	...	...
Q8_0	8	[30 GB]
BF16	4	[60 GB]

操作注意事項

上下文設置：建議最小上下文為8k - 16k。
溫度設置：溫度為1+、2+時，對於較小的量化版本和/或創意使用效果更好；溫度為0.5 - 0.7時，最適合推理，對於大於IQ2的量化版本（IQ1/IQ2s在推理時受益於略高的溫度）。
重複懲罰設置：建議IQ1、IQ2量化版本的重複懲罰設置為1.1，以控制“低比特量化習慣”。
系統角色：所有量化版本都應使用系統角色（示例見文檔底部）。
模板使用：模型使用“默認”Jinja模板（嵌入在GGUFs中）和/或CHATML模板。