🚀 ⚛️ Q Model: Optimized for Enhanced Quantized Inference Capability
This model is specifically optimized to boost the performance of quantized inference. It is highly recommended for use in 3 to 8-bit quantization scenarios, offering more efficient and effective inference results.
✨ Features
- Optimized Quantized Inference: Specially designed to improve the performance of quantized inference, making it suitable for 3 - 8 bit quantization scenarios.
- Multilingual Support: Supports multiple languages including Chinese, English, French, German, Japanese, Korean, Italian, and Finnish.
- Long Context Length: With a context length of 200K tokens, it can handle more complex tasks.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
We recommend using the fast tokenizer from transformers
, which should be enabled by default in the transformers
and vllm
libraries. Other implementations including sentencepiece
may not work as expected, especially for special tokens like <|role|>
, <|says|>
and <|end|>
.
<|role|>system<|says|>You(assistant) are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human(user).
Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
You cannot access the internet, but you have vast knowledge, cutoff: 2023-04.
You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), not related to GPT or OpenAI.<|end|>
<|role|>user<|says|>History input 1<|end|>
<|role|>assistant<|says|>History output 1<|end|>
<|role|>user<|says|>History input 2<|end|>
<|role|>assistant<|says|>History output 2<|end|>
<|role|>user<|says|>Current input<|end|>
<|role|>assistant<|says|>
This format is also defined in tokenizer_config.json
, which means you can directly use vllm
to deploy an OpenAI-like API service. For more information, please refer to the vllm documentation.
📚 Documentation
Model Info
Property |
Details |
Base Model |
Qwen/QwQ-32B |
Context Length |
200K Tokens |
License |
Apache 2.0 |
Links
Prompt Format
The recommended prompt format is shown above, and it is also defined in tokenizer_config.json
. You can use vllm
to deploy an OpenAI-like API service.
Disclaimer
⚠️ Important Note
All OpenBuddy models have inherent limitations and may potentially produce outputs that are erroneous, harmful, offensive, or otherwise undesirable. Users should not use these models in critical or high-stakes situations that may lead to personal injury, property damage, or significant losses. Examples of such scenarios include, but are not limited to, the medical field, controlling software and hardware systems that may cause harm, and making important financial or legal decisions.
OpenBuddy is provided "as-is" without any warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall the authors, contributors, or copyright holders be liable for any claim, damages, or other liabilities, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.
By using OpenBuddy, you agree to these terms and conditions, and acknowledge that you understand the potential risks associated with its use. You also agree to indemnify and hold harmless the authors, contributors, and copyright holders from any claims, damages, or liabilities arising from your use of OpenBuddy.
