Qwen3-8b-192k開源模型 - 支持長文處理，免費用於邊緣設備部署

首頁

Qwen3 8b 192k Context 6X Josiefied Uncensored MLX AWQ 4bit

由Goraint開發

Qwen3-8B的4位AWQ量化版本，專為MLX庫優化，支持19.2萬詞元長上下文處理，適用於邊緣設備部署。

大型語言模型開源協議:Apache-2.0 #蘋果芯片優化 #192k長上下文 #4位高效量化

下載量 204

發布時間 : 5/15/2025

模型概述

基於Qwen3-8B的4位量化模型，通過MLX庫實現蘋果芯片高效推理，保留原模型核心能力的同時降低資源消耗。

模型特點

高效推理

4位量化使內存佔用較FP16降低約75%

長上下文支持

19.2萬詞元處理能力（標準版6倍）

蘋果芯片優化

通過MLX庫實現M1/M3芯片加速

邊緣設備部署

低資源消耗適合本地設備運行

模型能力

長文本生成

對話式交互

文檔分析

代碼生成

使用案例

研究

長上下文NLP實驗

支持超長文本序列的語言建模研究

模型壓縮研究

4位量化技術的效果驗證

開發

邊緣設備聊天機器人

在蘋果設備部署本地化對話系統

M3 Ultra實測112.8詞元/秒

長文檔處理

書籍/論文等長文本分析與摘要生成

企業應用

代碼生成

基於長上下文生成完整代碼片段

🚀 Qwen3-8B 4位AWQ量化版本

本項目是Qwen3-8B的4位AWQ量化版本，藉助MLX庫進行了高效推理優化。它專為處理長上下文任務（192k令牌）而設計，能在減少資源使用的同時，保留Qwen3-8B的核心能力，還支持在邊緣設備上部署。

🚀 快速開始

安裝

# 僅適用於蘋果硅芯片設備安裝MLX
pip install mlx

# 使用Hugging Face Transformers加載模型
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")

示例用法

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

✨ 主要特性

高效推理：4位量化與FP16相比，可減少約75%的內存佔用。
長上下文支持：支持192k令牌，適用於複雜任務，如文檔分析、代碼生成。
跨平臺：可在搭載蘋果硅芯片的macOS系統上運行，藉助MLX實現加速。
可定製提示：可調整提示模板，以兼容LM Studio等工具。

📦 安裝指南

# 僅適用於蘋果硅芯片設備安裝MLX
pip install mlx

# 使用Hugging Face Transformers加載模型
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")

💻 使用示例

基礎用法

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📚 詳細文檔

概述

這是一個經過4位AWQ量化的Qwen3-8B版本，通過MLX庫進行了高效推理優化，旨在以較低的資源消耗處理長上下文任務（192k令牌）。在保留Qwen3-8B核心能力的同時，支持在邊緣設備上部署。

性能指標

指標	值
模型大小	~4.38 GB（4位量化）
推理速度	30.58令牌/秒（M1 MAX） 112.80令牌/秒（M3 ULTRA） gguf Q4_K_S：8.14令牌/秒（M1 MAX）
上下文支持	192,000令牌

重要提示：LM Studio使用的提示模板

你需要修改提示模板，以確保與LM Studio的推理管道兼容。以下是所需的模板結構：

{%- if tools %}
    {{- '\/system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call>...</tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n" }}
{%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '\/system\n' + messages[0].content + '\/\n' }}
    {%- endif %}
{%- endif %}

{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- set tool_start = "ÔΩü" %}
    {%- set tool_start_length = tool_start|length %}
    {%- set start_of_message = message.content[:tool_start_length] %}
    {%- set tool_end = "ÔΩ†" %}
    {%- set tool_end_length = tool_end|length %}
    {%- set start_pos = (message.content|length) - tool_end_length %}
    {%- if start_pos < 0 %}
        {%- set start_pos = 0 %}
    {%- endif %}
    {%- set end_of_message = message.content[start_pos:] %}
    {%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}

{%- for message in messages %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '\/' + message.role + '\n' + message.content + '\/' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set content = message.content %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '\/' in message.content %}
                {%- set content = (message.content.split('\/')|last).lstrip('\n') %}
                {%- set reasoning_content = (message.content.split('\/')|first).rstrip('\n') %}
                {%- set reasoning_content = (reasoning_content.split('')|last).lstrip('\n') %}
            {%- endif %}
        {%- endif %}

        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '\/' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\/\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '\/' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '\/' + message.role + '\n' + content }}
        {%- endif %}

        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '\/\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '\/user' }}
        {%- endif %}
        {{- '\nÔΩü\n' }}
        {{- message.content }}
        {{- '\nÔΩ†' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '\/\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}

{%- if add_generation_prompt %}
    {{- '\/assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '

模型詳情

屬性	詳情
基礎模型	Qwen3-8B
量化方式	通過MLX庫進行AWQ Q4（4位）量化
上下文長度	192,000令牌（比標準長6倍）
庫	MLX（針對蘋果硅芯片、macOS優化）
許可證	Apache 2.0
管道	`text-generation`
標籤	`not-for-all-audiences`，`conversational`，`mlx`