Qwen3-8b-192kオープンソースモデル - 長文処理に対応し、エッジデバイスへのデプロイに無料で使用可能

ホーム

Qwen3 8b 192k Context 6X Josiefied Uncensored MLX AWQ 4bit

Goraintによって開発

Qwen3-8Bの4ビットAWQ量子化バージョン、MLXライブラリ向けに最適化され、19.2万トークンの長文コンテキスト処理をサポート、エッジデバイス向けのデプロイメントに適しています。

大規模言語モデルオープンソースライセンス:Apache-2.0 #Appleチップ最適化 #192k長文コンテキスト #4ビット効率的量子化

ダウンロード数 204

リリース時間 : 5/15/2025

モデル概要

Qwen3-8Bベースの4ビット量子化モデル、MLXライブラリによりAppleチップで効率的な推論を実現、元モデルのコア能力を保持しつつリソース消費を低減。

モデル特徴

効率的な推論

4ビット量子化によりFP16比でメモリ使用量を約75%削減

長文コンテキストサポート

19.2万トークン処理能力（標準版の6倍）

Appleチップ最適化

MLXライブラリによるM1/M3チップの高速化

エッジデバイスデプロイメント

低リソース消費でローカルデバイスでの実行に適している

モデル能力

長文テキスト生成

対話型インタラクション

ドキュメント分析

コード生成

使用事例

研究

長文コンテキストNLP実験

超長文テキストシーケンスの言語モデリング研究をサポート

モデル圧縮研究

4ビット量子化技術の効果検証

開発

エッジデバイスチャットボット

Appleデバイスにローカル対話システムをデプロイ

M3 Ultra実測112.8トークン/秒

長文ドキュメント処理

書籍/論文などの長文テキスト分析と要約生成

企業アプリケーション

コード生成

長文コンテキストに基づく完全なコードスニペット生成

🚀 Qwen3-8B 4-bit AWQ量子化バージョン

このモデルは、MLXライブラリを使用して効率的な推論を行うために最適化されたQwen3-8Bの4-bit AWQ量子化バージョンです。長文脈タスク（192kトークン）を低リソースで処理でき、Qwen3-8Bの核心機能を維持しつつ、エッジデバイスへのデプロイを可能にします。

🚀 クイックスタート

インストール

# MLXのインストール（Apple Siliconのみ）
pip install mlx

# Hugging Face Transformersを使用してモデルをロード
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")

使用例

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

✨ 主な機能

効率的な推論：4-bit量子化により、FP16と比較してメモリ使用量を約75％削減します。
長文脈対応：192kトークンをサポートし、文書分析やコード生成などの複雑なタスクに対応します。
クロスプラットフォーム：MLXを使用してApple Siliconで加速されるmacOSで動作します。
カスタマイズ可能なプロンプト：LM Studioなどのツールとの互換性のためにテンプレートを調整できます。

📦 インストール

# MLXのインストール（Apple Siliconのみ）
pip install mlx

# Hugging Face Transformersを使用してモデルをロード
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")

💻 使用例

基本的な使用法

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📚 ドキュメント

性能指標

メトリック	値
モデルサイズ	~4.38 GB (4-bit量子化)
推論速度	30.58 tokens/sec (M1 MAX) 112.80 tokens/sec (M3 ULTRA) gguf Q4_K_S: 8.14 tokens/sec (M1 MAX)
コンテキストサポート	192,000 tokens

LM Studioでの使用に必要なプロンプトテンプレート

LM Studioの推論パイプラインとの互換性を確保するために、プロンプトテンプレートを変更する必要があります。以下は必要なテンプレート構造です。

{%- if tools %}
    {{- '\/system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call>...</tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n" }}
{%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '\/system\n' + messages[0].content + '\/\n' }}
    {%- endif %}
{%- endif %}

{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- set tool_start = "ÔΩü" %}
    {%- set tool_start_length = tool_start|length %}
    {%- set start_of_message = message.content[:tool_start_length] %}
    {%- set tool_end = "ÔΩ†" %}
    {%- set tool_end_length = tool_end|length %}
    {%- set start_pos = (message.content|length) - tool_end_length %}
    {%- if start_pos < 0 %}
        {%- set start_pos = 0 %}
    {%- endif %}
    {%- set end_of_message = message.content[start_pos:] %}
    {%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}

{%- for message in messages %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '\/' + message.role + '\n' + message.content + '\/' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set content = message.content %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '\/' in message.content %}
                {%- set content = (message.content.split('\/')|last).lstrip('\n') %}
                {%- set reasoning_content = (message.content.split('\/')|first).rstrip('\n') %}
                {%- set reasoning_content = (reasoning_content.split('')|last).lstrip('\n') %}
            {%- endif %}
        {%- endif %}

        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '\/' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\/\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '\/' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '\/' + message.role + '\n' + content }}
        {%- endif %}

        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '\/\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '\/user' }}
        {%- endif %}
        {{- '\nÔΩü\n' }}
        {{- message.content }}
        {{- '\nÔΩ†' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '\/\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}

{%- if add_generation_prompt %}
    {{- '\/assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '