QwQ-32B-Preview-AWQ開源模型 - 降低內存計算需求，適合有限硬件部署

首頁

Qwq 32B Preview AWQ

由KirillR開發

QwQ-32B-Preview的AWQ 4位量化版本，顯著降低內存使用和計算需求，適合資源有限的硬件部署。

大型語言模型

Transformers

英語開源協議:Apache-2.0 #4位量化推理 #數學編碼增強 #低顯存部署

下載量 2,247

發布時間 : 11/27/2024

模型概述

由通義團隊開發的QwQ-32B-Preview模型的AWQ 4位量化版本，旨在提升AI推理能力，尤其在數學和編碼任務中表現出色。

模型特點

低資源佔用

量化後的模型顯著降低了內存使用和計算需求，適合在資源有限的硬件上部署。

特定領域表現出色

該模型旨在提升AI推理能力，尤其在數學和編碼任務中表現出色。

模型能力

文本生成

數學推理

編碼輔助

多輪對話

使用案例

教育

數學問題解答

解答覆雜的數學問題，提供逐步推理過程。

在數學推理任務中表現出色

編程

代碼生成與輔助

幫助開發者生成代碼或解決編程問題。

在編碼任務中表現優異

🚀 QwQ-32B-Preview AWQ 4位量化版本

本項目提供了由通義團隊開發的 QwQ-32B-Preview 模型的 AWQ 4位量化 版本。量化後的模型顯著降低了內存使用和計算需求，使其適合在資源有限的硬件上部署。

🚀 快速開始

要加載分詞器和模型，並使用量化模型生成內容，可按以下步驟操作：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "KirillR/QwQ-32B-Preview-AWQ"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many 'r's are in 'strawberry'?"
messages = [
    {"role": "system", "content": "You are a helpful assistant developed by Alibaba. Please think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

✨ 主要特性

低資源佔用：量化後的模型顯著降低了內存使用和計算需求，適合在資源有限的硬件上部署。
特定領域表現出色：該模型旨在提升AI推理能力，尤其在數學和編碼任務中表現出色。

📦 安裝指南

請確保使用的是最新版本的Hugging Face Transformers，因為通義2.5的代碼集成在其中。使用早於 4.37.0 的版本可能會導致以下錯誤：

KeyError: 'qwen2'

📚 詳細文檔

模型侷限性

語言混合和代碼切換：模型可能會意外地在不同語言之間切換或混合使用，影響響應的清晰度。
遞歸推理循環：模型有可能陷入循環推理模式，導致冗長的響應但沒有明確的答案。
安全和倫理考量：需要增強安全措施以確保可靠和安全的性能。用戶在部署模型時應謹慎操作。
性能侷限：雖然在數學和編碼方面表現出色，但模型在常識推理和細微語言理解等領域可能表現不佳。

🔗 原始模型

有關原始QwQ-32B-Preview模型的更多詳細信息，請參考以下資源：

https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

📄 許可證

本項目採用Apache-2.0許可證。

📖 引用

如果您覺得原始模型有幫助，請考慮引用原作者：

@misc{qwq-32b-preview,
    title = {QwQ: Reflect Deeply on the Boundaries of the Unknown},
    url = {https://qwenlm.github.io/blog/qwq-32b-preview/},
    author = {Qwen Team},
    month = {November},
    year = {2024}
}

@article{qwen2,
      title={Qwen2 Technical Report}, 
      author={An Yang and Baosong Yang and others},
      journal={arXiv preprint arXiv:2407.10671},
      year={2024}
}