Qwen3-8b-192k开源模型 - 支持长文处理，免费用于边缘设备部署

首页

Qwen3 8b 192k Context 6X Josiefied Uncensored MLX AWQ 4bit

由 Goraint 开发

Qwen3-8B的4位AWQ量化版本，专为MLX库优化，支持19.2万词元长上下文处理，适用于边缘设备部署。

大型语言模型开源协议:Apache-2.0 #苹果芯片优化 #192k长上下文 #4位高效量化

下载量 204

发布时间 : 5/15/2025

模型简介

基于Qwen3-8B的4位量化模型，通过MLX库实现苹果芯片高效推理，保留原模型核心能力的同时降低资源消耗。

模型特点

高效推理

4位量化使内存占用较FP16降低约75%

长上下文支持

19.2万词元处理能力（标准版6倍）

苹果芯片优化

通过MLX库实现M1/M3芯片加速

边缘设备部署

低资源消耗适合本地设备运行

模型能力

长文本生成

对话式交互

文档分析

代码生成

使用案例

研究

长上下文NLP实验

支持超长文本序列的语言建模研究

模型压缩研究

4位量化技术的效果验证

开发

边缘设备聊天机器人

在苹果设备部署本地化对话系统

M3 Ultra实测112.8词元/秒

长文档处理

书籍/论文等长文本分析与摘要生成

企业应用

代码生成

基于长上下文生成完整代码片段

🚀 Qwen3-8B 4位AWQ量化版本

本项目是Qwen3-8B的4位AWQ量化版本，借助MLX库进行了高效推理优化。它专为处理长上下文任务（192k令牌）而设计，能在减少资源使用的同时，保留Qwen3-8B的核心能力，还支持在边缘设备上部署。

🚀 快速开始

安装

# 仅适用于苹果硅芯片设备安装MLX
pip install mlx

# 使用Hugging Face Transformers加载模型
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")

示例用法

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

✨ 主要特性

高效推理：4位量化与FP16相比，可减少约75%的内存占用。
长上下文支持：支持192k令牌，适用于复杂任务，如文档分析、代码生成。
跨平台：可在搭载苹果硅芯片的macOS系统上运行，借助MLX实现加速。
可定制提示：可调整提示模板，以兼容LM Studio等工具。

📦 安装指南

# 仅适用于苹果硅芯片设备安装MLX
pip install mlx

# 使用Hugging Face Transformers加载模型
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")

💻 使用示例

基础用法

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📚 详细文档

概述

这是一个经过4位AWQ量化的Qwen3-8B版本，通过MLX库进行了高效推理优化，旨在以较低的资源消耗处理长上下文任务（192k令牌）。在保留Qwen3-8B核心能力的同时，支持在边缘设备上部署。

性能指标

指标	值
模型大小	~4.38 GB（4位量化）
推理速度	30.58令牌/秒（M1 MAX） 112.80令牌/秒（M3 ULTRA） gguf Q4_K_S：8.14令牌/秒（M1 MAX）
上下文支持	192,000令牌

重要提示：LM Studio使用的提示模板

你需要修改提示模板，以确保与LM Studio的推理管道兼容。以下是所需的模板结构：

{%- if tools %}
    {{- '\/system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call>...</tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n" }}
{%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '\/system\n' + messages[0].content + '\/\n' }}
    {%- endif %}
{%- endif %}

{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- set tool_start = "ÔΩü" %}
    {%- set tool_start_length = tool_start|length %}
    {%- set start_of_message = message.content[:tool_start_length] %}
    {%- set tool_end = "ÔΩ†" %}
    {%- set tool_end_length = tool_end|length %}
    {%- set start_pos = (message.content|length) - tool_end_length %}
    {%- if start_pos < 0 %}
        {%- set start_pos = 0 %}
    {%- endif %}
    {%- set end_of_message = message.content[start_pos:] %}
    {%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}

{%- for message in messages %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '\/' + message.role + '\n' + message.content + '\/' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set content = message.content %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '\/' in message.content %}
                {%- set content = (message.content.split('\/')|last).lstrip('\n') %}
                {%- set reasoning_content = (message.content.split('\/')|first).rstrip('\n') %}
                {%- set reasoning_content = (reasoning_content.split('')|last).lstrip('\n') %}
            {%- endif %}
        {%- endif %}

        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '\/' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\/\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '\/' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '\/' + message.role + '\n' + content }}
        {%- endif %}

        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '\/\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '\/user' }}
        {%- endif %}
        {{- '\nÔΩü\n' }}
        {{- message.content }}
        {{- '\nÔΩ†' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '\/\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}

{%- if add_generation_prompt %}
    {{- '\/assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '

模型详情

属性	详情
基础模型	Qwen3-8B
量化方式	通过MLX库进行AWQ Q4（4位）量化
上下文长度	192,000令牌（比标准长6倍）
库	MLX（针对苹果硅芯片、macOS优化）
许可证	Apache 2.0
管道	`text-generation`
标签	`not-for-all-audiences`，`conversational`，`mlx`