đ Qwen-Audio-nf4
This is the quantized version of Qwen-Audio. It is a remarkable model in the field of audio processing, offering advanced capabilities for handling various audio types and generating corresponding text outputs.
đ Quick Start
Below, we provide simple examples to show how to use Qwen-Audio with đ¤ Transformers.
Before running the code, make sure you have setup the environment and installed the required packages. Make sure you meet the following requirements, and then install the dependent libraries.
pip install -r requirements.txt
For more details, please refer to tutorial.
đģ Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cuda", trust_remote_code=True).eval()
audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/1272-128104-0000.flac"
sp_prompt = "<|startoftranscript|><|en|><|transcribe|><|en|><|notimestamps|><|wo_itn|>"
query = f"<audio>{audio_url}</audio>{sp_prompt}"
audio_info = tokenizer.process_audio(query)
inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info)
inputs = inputs.to(model.device)
pred = model.generate(**inputs, audio_info=audio_info)
response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info)
print(response)
⨠Features
Qwen-Audio (Qwen Large Audio Language Model) is the multimodal version of the large model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-Audio accepts diverse audio (human speech, natural sound, music and song) and text as inputs, outputs text. The contributions of Qwen-Audio include:
- Fundamental audio models: Qwen-Audio is a fundamental multi-task audio-language model that supports various tasks, languages, and audio types, serving as a universal audio understanding model. Building upon Qwen-Audio, we develop Qwen-Audio-Chat through instruction fine-tuning, enabling multi-turn dialogues and supporting diverse audio-oriented scenarios.
- Multi-task learning framework for all types of audios: To scale up audio-language pre-training, we address the challenge of variation in textual labels associated with different datasets by proposing a multi-task training framework, enabling knowledge sharing and avoiding one-to-many interference. Our model incorporates more than 30 tasks and extensive experiments show the model achieves strong performance.
- Strong Performance: Experimental results show that Qwen-Audio achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts. Specifically, Qwen-Audio achieves state-of-the-art results on the test set of Aishell1, cochlscene, ClothoAQA, and VocalSound.
- Flexible multi-run chat from audio and text input: Qwen-Audio supports multiple-audio analysis, sound understanding and reasoning, music appreciation, and tool usage for speech editing.
đĻ Installation
Requirements
- python 3.8 and above
- pytorch 1.12 and above, 2.0 and above are recommended
- CUDA 11.4 and above are recommended (this is for GPU users)
- FFmpeg
đ Documentation
We release Qwen-Audio and Qwen-Audio-Chat, which are pretrained model and Chat model respectively. For more details about Qwen-Audio, please refer to our Github Repo. This repo is the one for Qwen-Audio.
đ License
Researchers and developers are free to use the codes and model weights of Qwen-Audio. We also allow its commercial use. Check our license at LICENSE for more details.
đ§ Technical Details
Citation
If you find our paper and code useful in your research, please consider giving a star and citation
@article{Qwen-Audio,
title={Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models},
author={Chu, Yunfei and Xu, Jin and Zhou, Xiaohuan and Yang, Qian and Zhang, Shiliang and Yan, Zhijie and Zhou, Chang and Zhou, Jingren},
journal={arXiv preprint arXiv:2311.07919},
year={2023}
}
đ Contact Us
If you are interested to leave a message to either our research team or product team, feel free to send an email to qianwen_opensource@alibabacloud.com.