QwQ-32B-GGUF Open Source Model - A Practical Tool for Free Local Deployment and Inference

Qwq 32B GGUF

Developed by tensorblock

GGUF format quantized version of QwQ-32B, suitable for local deployment and inference

Large Language Model EnglishOpen Source License:Apache-2.0 #32B large parameter model #GGUF quantization format #Multi-turn conversation optimization

Downloads 1,312

Release Time : 3/5/2025

Model Overview

This repository contains GGUF format model files for Qwen/QwQ-32B, quantized by TensorBlock's machines, compatible with llama.cpp.

Model Features

Multiple quantization versions

Provides 12 quantization versions from Q2_K to Q8_0 to meet different hardware and performance needs

llama.cpp compatibility

Compatible with llama.cpp up to commit b4823, facilitating local deployment

Chat optimization

Provides specialized prompt templates to enhance chat interaction experience

Model Capabilities

Text generation

Conversational interaction

Use Cases

Dialogue systems

Intelligent chat assistant

Deploy local chatbots

Content generation

Text creation

Generate various text content

🚀 Qwen/QwQ-32B - GGUF

This repo offers GGUF format model files for Qwen/QwQ-32B. It provides a convenient way to use the model with compatibility and quantization support.

Feedback and support: TensorBlock's Twitter/X, Telegram Group and Discord server

🚀 Quick Start

This section provides a quick overview of the project and how to get started.

The files were quantized using machines provided by TensorBlock, and they are compatible with llama.cpp as of commit b4823.

✨ Features

Our projects

Project	Description
Awesome MCP Servers	A comprehensive collection of Model Context Protocol (MCP) servers. 👀 See what we built 👀
TensorBlock Studio	A lightweight, open, and extensible multi-LLM interaction studio. 👀 See what we built 👀

📚 Documentation

Prompt template

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
<think>

Model file specification

Filename	Quant type	File Size	Description
QwQ-32B-Q2_K.gguf	Q2_K	12.313 GB	smallest, significant quality loss - not recommended for most purposes
QwQ-32B-Q3_K_S.gguf	Q3_K_S	14.392 GB	very small, high quality loss
QwQ-32B-Q3_K_M.gguf	Q3_K_M	15.935 GB	very small, high quality loss
QwQ-32B-Q3_K_L.gguf	Q3_K_L	17.247 GB	small, substantial quality loss
QwQ-32B-Q4_0.gguf	Q4_0	18.640 GB	legacy; small, very high quality loss - prefer using Q3_K_M
QwQ-32B-Q4_K_S.gguf	Q4_K_S	18.784 GB	small, greater quality loss
QwQ-32B-Q4_K_M.gguf	Q4_K_M	19.851 GB	medium, balanced quality - recommended
QwQ-32B-Q5_0.gguf	Q5_0	22.638 GB	legacy; medium, balanced quality - prefer using Q4_K_M
QwQ-32B-Q5_K_S.gguf	Q5_K_S	22.638 GB	large, low quality loss - recommended
QwQ-32B-Q5_K_M.gguf	Q5_K_M	23.262 GB	large, very low quality loss - recommended
QwQ-32B-Q6_K.gguf	Q6_K	26.886 GB	very large, extremely low quality loss
QwQ-32B-Q8_0.gguf	Q8_0	34.821 GB	very large, extremely low quality loss - not recommended

📦 Installation

Downloading instruction

Command line

Firstly, install Huggingface Client

pip install -U "huggingface_hub[cli]"

Then, download the individual model file to a local directory

huggingface-cli download tensorblock/QwQ-32B-GGUF --include "QwQ-32B-Q2_K.gguf" --local-dir MY_LOCAL_DIR

If you wanna download multiple model files with a pattern (e.g., *Q4_K*gguf), you can try:

huggingface-cli download tensorblock/QwQ-32B-GGUF --local-dir MY_LOCAL_DIR --local-dir-use-symlinks False --include='*Q4_K*gguf'

📄 License

This project is licensed under the Apache-2.0 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご