🚀 Qwen3-8B 4-bit AWQ Quantized Model
A 4-bit AWQ quantized version of Qwen3-8B optimized for efficient inference using the MLX library, designed to handle long-context tasks with reduced resource usage.
🚀 Quick Start
Installation
pip install mlx
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Goraint/Qwen3-8b-192k-Context-6X-Josiefied-Uncensored-MLX-AWQ-4bit")
Example Usage
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
✨ Features
- Efficient Inference: 4-bit quantization reduces memory footprint by ~75% vs. FP16.
- Long Context Support: 192k tokens for complex tasks (e.g., document analysis, code generation).
- Cross-Platform: Works on macOS with MLX for Apple Silicon acceleration.
- Customizable Prompting: Adjust templates for compatibility with tools like LM Studio.
📦 Model Details
Property |
Details |
Base Model |
Qwen3-8B |
Quantization |
AWQ Q4 (4-bit) via MLX library |
Context Length |
192,000 tokens (6x longer than standard) |
Library |
MLX (optimized for Apple Silicon, macOS) |
License |
Apache 2.0 |
Pipeline |
text-generation |
Tags |
not-for-all-audiences , conversational , mlx |
📊 Performance Metrics
Metric |
Value |
Model Size |
~4.38 GB (4-bit quantized) |
Inference Speed |
30.58 tokens/sec (M1 MAX) 112.80 tokens/sec (M3 ULTRA) gguf Q4_K_S: 8.14 tokens/sec (M1 MAX) |
Context Support |
192,000 tokens |
💻 Usage Examples
Basic Usage
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("mps")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Advanced Usage
{%- if tools %}
{{- '\/system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call>...</tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '\/system\n' + messages[0].content + '\/\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- set tool_start = "ÔΩü" %}
{%- set tool_start_length = tool_start|length %}
{%- set start_of_message = message.content[:tool_start_length] %}
{%- set tool_end = "ÔΩ†" %}
{%- set tool_end_length = tool_end|length %}
{%- set start_pos = (message.content|length) - tool_end_length %}
{%- if start_pos < 0 %}
{%- set start_pos = 0 %}
{%- endif %}
{%- set end_of_message = message.content[start_pos:] %}
{%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '\/' + message.role + '\n' + message.content + '\/' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set content = message.content %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is defined and message.reasoning_content is not none %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '\/' in message.content %}
{%- set content = (message.content.split('\/')|last).lstrip('\n') %}
{%- set reasoning_content = (message.content.split('\/')|first).rstrip('\n') %}
{%- set reasoning_content = (reasoning_content.split('')|last).lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '\/' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\/\n' + content.lstrip('\n') }}
{%- else %}
{{- '\/' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '\/' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '\/\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '\/user' }}
{%- endif %}
{{- '\nÔΩü\n' }}
{{- message.content }}
{{- '\nÔΩ†' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '\/\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '\/assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '
📚 Use Cases
- Research: Long-context NLP experiments, model compression studies.
- Development: Edge deployments, real-time chatbots with extended context.
- Enterprise: Cost-effective AI solutions for document processing and code generation.
🔧 Technical Details
Potential Biases
- Trained on diverse data but may inherit societal biases (e.g., gender, cultural assumptions).
- "Not-for-all-audiences" tag indicates potential for generating sensitive content.
Technical Limitations
- 4-bit quantization may slightly reduce accuracy on complex tasks.
- Performance depends on hardware (MLX optimized for Apple Silicon).
Mitigation Strategies
- Review outputs for sensitive content.
- Use in controlled environments with monitoring.
🌱 Environmental Impact
👥 Community & Resources
📄 License
Apache 2.0
⚠️ Important Note
This model is a community contribution and may not be officially supported by Alibaba Cloud. Always validate outputs for accuracy and safety in production environments.