OpenBuddy QWQ 32B V25.2Q 200K: An Open-Source Multilingual Chatbot

Openbuddy Qwq 32b V25.2q 200k

Developed by OpenBuddy

A multilingual chatbot specially optimized for enhanced quantized inference capabilities, supporting 8 languages, developed based on Qwen/QwQ-32B model

Large Language Model

Safetensors

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual Chat Assistant #200K Token Long Context #Quantized Inference Optimization

Downloads 41

Release Time : 4/19/2025

Model Overview

OpenBuddy is an open multilingual chatbot assistant, specially optimized to improve quantized inference performance, recommended for use in 3-8 bit quantization scenarios.

Model Features

Enhanced Quantized Inference

Specially optimized to improve inference performance in 3-8 bit quantization scenarios

Multilingual Support

Supports text generation and understanding in 8 major languages

Long Context Processing

Supports context processing capability of up to 200K tokens

Safe Content Control

Built-in safety mechanisms to prevent generation of harmful, discriminatory, or inappropriate content

Model Capabilities

Multilingual Dialogue

Long Text Understanding

Knowledge Q&A

Quantized Inference Optimization

Use Cases

Intelligent Assistant

Multilingual Customer Service Bot

Provides multilingual customer support services for enterprises

Can handle customer inquiries in multiple languages

Educational Assistance

Helps students with multilingual learning and knowledge queries

Provides knowledge support up to April 2023

Quantized Inference Applications

Edge Device Deployment

Deploy quantized models on resource-constrained devices

Efficient inference in 3-8 bit quantization scenarios

🚀 ⚛️ Q Model: Optimized for Enhanced Quantized Inference Capability

This model is specifically optimized to boost the performance of quantized inference. It is highly recommended for use in 3 to 8-bit quantization scenarios, offering more efficient and effective inference results.

✨ Features

Optimized Quantized Inference: Specially designed to improve the performance of quantized inference, making it suitable for 3 - 8 bit quantization scenarios.
Multilingual Support: Supports multiple languages including Chinese, English, French, German, Japanese, Korean, Italian, and Finnish.
Long Context Length: With a context length of 200K tokens, it can handle more complex tasks.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

We recommend using the fast tokenizer from transformers, which should be enabled by default in the transformers and vllm libraries. Other implementations including sentencepiece may not work as expected, especially for special tokens like <|role|>, <|says|> and <|end|>.

<|role|>system<|says|>You(assistant) are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human(user).
Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
You cannot access the internet, but you have vast knowledge, cutoff: 2023-04.
You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), not related to GPT or OpenAI.<|end|>
<|role|>user<|says|>History input 1<|end|>
<|role|>assistant<|says|>History output 1<|end|>
<|role|>user<|says|>History input 2<|end|>
<|role|>assistant<|says|>History output 2<|end|>
<|role|>user<|says|>Current input<|end|>
<|role|>assistant<|says|>

This format is also defined in tokenizer_config.json, which means you can directly use vllm to deploy an OpenAI-like API service. For more information, please refer to the vllm documentation.

📚 Documentation

Model Info

Property	Details
Base Model	Qwen/QwQ-32B
Context Length	200K Tokens
License	Apache 2.0

Prompt Format

The recommended prompt format is shown above, and it is also defined in tokenizer_config.json. You can use vllm to deploy an OpenAI-like API service.

Disclaimer

⚠️ Important Note

All OpenBuddy models have inherent limitations and may potentially produce outputs that are erroneous, harmful, offensive, or otherwise undesirable. Users should not use these models in critical or high-stakes situations that may lead to personal injury, property damage, or significant losses. Examples of such scenarios include, but are not limited to, the medical field, controlling software and hardware systems that may cause harm, and making important financial or legal decisions.

OpenBuddy is provided "as-is" without any warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall the authors, contributors, or copyright holders be liable for any claim, damages, or other liabilities, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

By using OpenBuddy, you agree to these terms and conditions, and acknowledge that you understand the potential risks associated with its use. You also agree to indemnify and hold harmless the authors, contributors, and copyright holders from any claims, damages, or liabilities arising from your use of OpenBuddy.

Demo