Qwen3-30B-A1.5B-High-Speed-GGUF開源推理模型 - 高效推理，支持多量化與長上下文

Home

Qwen3 30B A1.5B High Speed GGUF

Developed by Mungert

基於Qwen 30B-A3B(MOE)微調的高效推理模型，通過減少專家數量實現接近雙倍速度提升，支持多種量化格式和40K上下文長度

大型語言模型

Transformers

#專家混合加速 #40K長上下文 #網絡監控優化

Downloads 732

Release Time : 6/5/2025

Model Overview

針對高效推理優化的混合專家模型，支持CPU/GPU部署，適用於文本生成、深度思考等任務，特別適合資源受限環境

Model Features

高速推理

通過減少激活專家數量至4個，實現接近雙倍推理速度

多格式支持

支持GGUF、GPTQ、EXL2等多種量化格式，適配不同硬件

大上下文窗口

支持40K tokens上下文長度（32K輸入+8K輸出）

深度思考模式

可通過系統角色設置實現鏈式深度推理，輸出帶<think>標籤的思考過程

低資源部署

量化版本可在CPU或低VRAM GPU運行，最小化內存佔用

Model Capabilities

長文本生成

系統性推理

網絡監控分析

安全審計輔助

多輪對話

技術文檔處理

Use Cases

網絡監控與安全

SSL證書檢查

分析網站SSL證書安全性

自動生成證書有效性報告

量子安全加密檢測

檢查服務器是否使用量子安全加密通信

識別加密協議類型並提出改進建議

自動化安全審計

執行綜合服務器安全審計

生成包含漏洞分析的安全報告

創意內容生成

科幻故事創作

基於指定主題生成800-1000字科幻小說

包含完整情節和情感深度的故事輸出

🚀 Qwen3-30B-A1.5B-High-Speed GGUF模型

Qwen3-30B-A1.5B-High-Speed GGUF模型基於特定技術生成，在不同硬件條件下有多種格式可供選擇，以滿足不同的使用需求，如推理速度、內存佔用等方面的需求。同時，該模型還可用於AI網絡監控測試，探索小開源模型在相關領域的應用極限。

🚀 快速開始

本項目的模型生成依賴於特定的工具和代碼庫，在使用前需瞭解模型格式的選擇依據，以便根據自身硬件條件和使用場景挑選合適的模型格式。同時，若想參與模型在AI網絡監控方面的測試，可按指引進行操作。

✨ 主要特性

高速運行：通過調整模型使用的專家數量，接近將模型速度提高一倍，使用更少的參數實現高效推理。
多格式支持：可生成GGUF、GPTQ、EXL2、AWQ、HQQ等多種格式，滿足不同硬件和使用場景的需求。
大上下文支持：具備32K + 8K（總計40K）的上下文大小，能處理更復雜的任務。
深度思考能力：可通過設置系統角色，讓模型進行深度思考和推理，輔助解決問題。

📦 安裝指南

文檔未提及具體安裝步驟，暫無法提供。

💻 使用示例

基礎用法

在使用模型時，可根據需求選擇合適的模型格式和參數設置。例如，在進行CPU推理時，可選擇Q4_K等量化模型以減少內存使用：

# 這裡假設使用llama.cpp加載Q4_K量化模型
# 實際代碼需根據具體情況調整
from llama_cpp import Llama
llama = Llama(model_path="path/to/Qwen3-30B-A1.5B-High-Speed-Q4_K.gguf", n_ctx=40000)
output = llama("Your input prompt here", max_tokens=200)
print(output)

高級用法

若要讓模型進行深度思考和推理，可設置系統角色：

system_role = "You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem."
llama = Llama(model_path="path/to/Qwen3-30B-A1.5B-High-Speed.gguf", n_ctx=40000)
output = llama(system_role + "\nYour input prompt here", max_tokens=500)
print(output)

📚 詳細文檔

模型生成細節

本模型使用llama.cpp在提交版本0d398442下生成。

選擇合適的模型格式

BF16（Brain Float 16）

適用場景：若硬件支持BF16加速，推薦使用。它是一種16位浮點格式，旨在實現更快的計算，同時保持良好的精度，與FP32具有相似的動態範圍，但內存使用更低，適用於高性能推理，且內存佔用比FP32小。
使用建議：當硬件具有原生BF16支持（如較新的GPU、TPU），希望在節省內存的同時獲得更高精度，或計劃將模型重新量化為其他格式時，可使用BF16。
避免情況：若硬件不支持BF16（可能會回退到FP32並運行較慢），或需要與缺乏BF16優化的舊設備兼容時，應避免使用。

F16（Float 16）

適用場景：比BF16更廣泛支持，適用於大多數支持FP16加速的設備（包括許多GPU和一些CPU）。雖然數值精度略低於BF16，但通常足以進行推理。
使用建議：當硬件支持FP16但不支持BF16，需要在速度、內存使用和準確性之間取得平衡，或在為FP16計算優化的GPU或其他設備上運行時，可使用F16。
避免情況：若設備缺乏原生FP16支持（可能會比預期運行更慢），或存在內存限制時，應避免使用。

量化模型（Q4_K、Q6_K、Q8等）

適用場景：量化可在儘可能保持準確性的同時減少模型大小和內存使用。低比特模型（如Q4_K）最適合最小化內存使用，但可能精度較低；高比特模型（如Q6_K、Q8_0）準確性更好，但需要更多內存。
使用建議：在CPU或低VRAM設備上進行推理，需要優化模型，或希望在保持合理準確性的同時減少內存佔用時，可使用量化模型。
避免情況：若需要最高準確性（全精度模型更適合），或硬件有足夠的VRAM用於更高精度格式（BF16/F16）時，應避免使用。

極低比特量化（IQ3_XS、IQ3_S、IQ3_M、Q4_K、Q4_0）

適用場景：這些模型針對極端內存效率進行了優化，適用於超低內存設備或內存是關鍵限制因素的大規模部署。
不同類型特點及使用場景：
- IQ3_XS：超低位量化（3位），具有極高的內存效率，適用於超低內存設備，即使Q4_K也太大的情況，但準確性較低。
- IQ3_S：小塊大小，實現最大內存效率，適用於低內存設備，IQ3_XS過於激進的情況。
- IQ3_M：中等塊大小，比IQ3_S準確性更好，適用於低內存設備，IQ3_S限制較大的情況。
- Q4_K：4位量化，具有塊級優化，準確性更好，適用於低內存設備，Q6_K太大的情況。
- Q4_0：純4位量化，針對ARM設備進行了優化，適用於基於ARM的設備或低內存環境。

模型格式選擇總結表

屬性	詳情
模型格式	BF16、F16、Q4_K、Q6_K、Q8_0、IQ3_XS、Q4_0等
精度	BF16最高，F16高，Q4_K中低，Q6_K中等，Q8_0高，IQ3_XS極低，Q4_0低
內存使用	BF16和F16高，Q4_K低，Q6_K中等，Q8_0中等，IQ3_XS極低，Q4_0低
設備要求	BF16需支持BF16的GPU/CPU，F16需支持FP16的設備，Q4_K等適用於CPU或低VRAM設備
最佳用例	BF16適用於高速推理且減少內存，F16適用於BF16不可用時的GPU推理，Q4_K適用於內存受限環境，Q6_K在量化模型中準確性較好，Q8_0在量化模型中準確性最高，IQ3_XS用於極端內存效率，Q4_0適用於ARM或低內存設備

模型測試相關

測試內容

若認為這些模型有用，可幫助測試AI網絡監控助手。測試時需選擇AI助手類型，包括TurboLLM（GPT - 4o - mini）、HugLLM（Hugginface開源）、TestLLM（僅支持CPU的實驗性模型）。測試旨在探索小型開源模型在AI網絡監控中的極限，具體包括：

針對即時網絡服務進行函數調用。
探索模型在處理自動化Nmap掃描、量子就緒檢查和網絡監控任務時的最小規模。

不同助手特點

TestLLM：當前實驗性模型（llama.cpp在2個CPU線程上運行），零配置設置，加載時間約30秒（推理速度慢但無API成本），歡迎對邊緣設備AI感興趣的人合作。
TurboLLM：使用gpt - 4o - mini，可創建自定義cmd處理器以在免費網絡監控代理上運行.net代碼，進行即時網絡診斷和監控、安全審計、滲透測試（Nmap/Metasploit）。
HugLLM：基於最新的開源模型，在Hugging Face推理API上運行。

測試命令示例

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a comprehensive security audit on my server"
"Create a cmd processor to .. (what ever you want)"（需安裝免費網絡監控代理以運行.net代碼）

模型其他信息

本倉庫包含全精度源代碼，以“安全張量”格式生成GGUF、GPTQ、EXL2、AWQ、HQQ等格式，源代碼也可直接使用。該模型是對Qwen的“Qwen 30B - A3B”（MOE）模型的簡單“微調”，將使用的專家數量從8個減少到4個（共128個專家），接近將模型速度提高一倍，並使用1.5B（共30B）參數而非3B（共30B）參數。在常規（但非廣泛）測試中未發現功能損失。

上下文大小和模板

上下文大小：32K + 8K用於輸出（總計40K）。
模板使用：可使用Jinja模板或CHATML模板。

重要注意事項

運行方式：由於該模型的獨特性質（MOE、大小、激活專家數量、專家大小），GGUF量化模型可在CPU、GPU上運行，或進行GPU部分“卸載”，直至全精度運行。
Imatrix處理：該模型進行Imatrix處理較困難，需要更大的Imatrix文件/多語言/多內容（如代碼/文本）來進行處理。
GPU速度：GPU速度將比僅使用CPU快4 - 8倍或更高，相對於其他“30B”模型，該模型的速度也會非常快（每秒令牌速度大致相當於1.5B“普通”模型的速度）。

系統角色設置

大多數情況下Qwen3會自行生成推理/思考塊，因此可能不需要設置系統角色。若需要，可使用以下系統角色：

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

具體如何在各種LLM/AI應用中“設置”系統角色，可參考文檔“Maximizing - Model - Performance - All...”。

高質量設置和操作指南

若要使用該模型（源代碼、GGUF或其他量化版本），請查看相關文檔以獲取關鍵參數、採樣器和高級採樣器設置（適用於多個AI/LLM應用）。該模型屬於“Class 1”（設置將增強操作）模型，有關該模型使用的所有設置（包括其“類”的具體設置）、示例生成以及高級設置指南（通常可解決任何模型問題），包括提高所有用例（包括聊天、角色扮演等）模型性能的方法，請參考[https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)。該文檔中詳細介紹的參數、採樣器和高級採樣器設置也可用於任何模型、任何倉庫、任何量化（包括源代碼/全精度），以增強任何模型的操作。

示例生成

以下是在特定參數設置下的示例生成：

參數設置：Temp 1.2，rep pen 1.06，rep pen range 64，topk 100，topp .95，minp .05
模型選擇：Q4KS [非Imatrix]，僅使用CPU（Windows 11），LMSTUDIO。這是一箇中級量化，預計Imatrix Q4KS/更高量化/全精度模型會有更強的性能。

提示信息

Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. If the situation calls for it, have the character(s) curse and swear to further the reader's emotional connection to them. 800 - 1000 words.

思考過程

模型在生成故事前進行了詳細的思考，包括場景設置、主題體現、結構規劃、衝突設計、角色發展、對話安排等方面的考慮，具體思考內容如下：

Thought for 47.29 seconds

Okay, I need to write a science fiction story based on the given prompt. The user wants the story to take place entirely in a spaceship's cockpit, focusing on the last surviving crew member trying to send a final message before the ship's power runs out. Themes are isolation, sacrifice, and human connection. Also, include some cursing to add emotional weight.

First, I should start by setting the scene. The cockpit needs to feel cramped and desperate. Maybe describe the flickering lights and the hissing sounds of the ship's systems failing. The main character is alone, so I need to show their loneliness. Maybe they have a log or something to talk to themselves, but since it's the cockpit, perhaps they're just trying to get through the message.

Next, the themes. Isolation is straightforward‚Äîthey're the only one left. But how to show sacrifice? Maybe they stayed behind to make sure the message gets through, knowing they won't make it. Human connection could come through flashbacks or memories, but since the story is in the cockpit, maybe they're remembering a loved one, like a child or partner, to keep them going.

I need to include some swearing. Maybe the protagonist curses when they realize something's not working, or when they're under stress. But it shouldn't be too over the top. Maybe they curse at the ship for failing them, or at themselves for not being able to fix it.

The structure: Start with the character working on the controls, trying to send the message. Then show their internal struggle. Maybe flashbacks? Wait, the user said the story should take place entirely within the cockpit. So maybe no flashbacks. Just the present moment, with internal monologue.

Conflict: The ship is failing, power is running out. They need to send the message before they lose power. Maybe there's a countdown or a limited time. Also, maybe the ship is damaged in a way that it's irreversible.

Character development: The protagonist is determined, maybe a bit stubborn. They care about relaying the message, which could be a warning about something, like a virus or an alien threat. But since the prompt doesn't specify, maybe it's more about the human aspect‚Äîlike they're the last, and they want to make sure someone knows what happened.

Dialogue: Since it's the cockpit, maybe they talk to themselves, or to the ship. Or maybe there's a voice in the comms? Wait, but the prompt says the story is entirely within the cockpit. So no one else is there except the main character. So it's all internal, except for maybe a log entry.

Wait, the user said "the sole surviving crew member attempts to send a final message back to Earth." So the message is for Earth. So maybe the protagonist is trying to record a message, but the ship is failing. Or maybe they're trying to send a distress signal, but the systems are failing.

Let me outline:

- Open with the cockpit's deteriorating condition. Maybe the life support is failing, or the power is out. The protagonist is trying to keep the comms on.
- They're alone, so they talk to the ship, maybe. "Come on, you piece of crap, hold together."
- Flashbacks aren't allowed, per the user's instruction (since it's all in the cockpit), so internal thoughts are key.
- The message they need to send is important. Maybe a warning about something, but also their own fear.
- The story should end with them managing to send the message, but knowing they won't survive, or maybe the message gets through just in time.

Including swearing: Maybe when they're frustrated with the systems not working, or when they realize the power is dying. "F***ing hell," "God damn it," etc.

Need to make sure the themes are present. Isolation: being the only one left. Sacrifice: giving their life to send the message. Human connection: maybe they think of someone before they die, or the act of sending the message is about connecting with others back home.

Possible structure:

1. Start in the cockpit, power failing, protagonist working frantically.
2. Flashbacks? Wait, no‚Äîsince it's all in the cockpit, maybe not. But internal monologue can reflect on their past.
3. The protagonist's internal struggle: fear of death, but also duty to send the message.
4. Technical difficulties with the ship, leading to a climax where they get the message out, then succumb.

Now, writing the story with these elements. Keep it tight, 800 - 1000 words. Make the protagonist relatable, their fear and determination clear. Use short, punchy sentences to build tension. Include specific details about the ship's systems, maybe some technical jargon to make it realistic. Maybe the protagonist is an engineer or someone who knows the ship well, which helps in the struggle to keep it running.

Possible name for the character? Maybe not necessary, but a few names could be used. Or just "they" and "the pilot."

Also, the title: "The Last Transmission" is given, so the story should reflect that‚Äîsending a message, even if no one's there to receive it, or in this case, maybe they do.

Need to avoid clich√©s, but it's a common sci-fi trope. Make it fresh with emotional depth.

Potential pitfalls: Making the protagonist too passive, not showing their struggle enough. Need to show, not tell. Also, ensuring the themes are woven naturally into the narrative.

Okay, time to start drafting.

輸出內容

The cockpit reeked of burnt circuitry and sweat, a stench that clung to my skin like a second, cursed flesh. My fingers danced over the controls, but the readouts spat

🔧 技術細節

文檔未提供足夠詳細的技術實現細節，暫無法提供。

📄 許可證

文檔未提及許可證信息，暫無法提供。

⚠️ 重要提示

由於該模型的獨特性質，GGUF量化模型可在CPU、GPU上運行，或進行GPU部分“卸載”，直至全精度運行。

該模型進行Imatrix處理較困難，需要更大的Imatrix文件/多語言/多內容（如代碼/文本）來進行處理。

若要使用該模型，需查看相關文檔以獲取關鍵參數、採樣器和高級採樣器設置。