🚀 Qwen2-Audio-7B
Qwen2-Audio是全新系列的通义大音频语言模型。它能够接收各种音频信号输入,并针对语音指令进行音频分析或直接给出文本回复。本项目提供了两种不同的音频交互模式,为用户带来多样化的使用体验。
🚀 快速开始
环境要求
Qwen2-Audio的代码已集成在最新的Hugging face transformers中。建议你使用以下命令从源代码进行构建,否则可能会遇到 KeyError: 'qwen2-audio'
错误:
pip install git+https://github.com/huggingface/transformers
代码示例
以下代码展示了如何加载处理器和模型,并执行预训练的Qwen2-Audio基础模型进行内容生成:
from io import BytesIO
from urllib.request import urlopen
import librosa
from transformers import AutoProcessor, Qwen2AudioForConditionalGeneration
model = Qwen2AudioForConditionalGeneration.from_pretrained("Qwen/Qwen2-Audio-7B" ,trust_remote_code=True)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-Audio-7B" ,trust_remote_code=True)
prompt = "<|audio_bos|><|AUDIO|><|audio_eos|>Generate the caption in English:"
url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/glass-breaking-151256.mp3"
audio, sr = librosa.load(BytesIO(urlopen(url).read()), sr=processor.feature_extractor.sampling_rate)
inputs = processor(text=prompt, audios=audio, return_tensors="pt")
generated_ids = model.generate(**inputs, max_length=256)
generated_ids = generated_ids[:, inputs.input_ids.size(1):]
response = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
✨ 主要特性
- 语音聊天:用户无需输入文本,即可与Qwen2-Audio进行自由的语音交互。
- 音频分析:用户在交互过程中可提供音频和文本指令,进行音频分析。
本项目发布了Qwen2-Audio-7B和Qwen2-Audio-7B-Instruct,分别为预训练模型和聊天模型。
更多详细信息,请参考博客、GitHub和报告。
📄 许可证
本项目采用Apache-2.0许可证。
📚 详细文档
引用信息
如果您觉得我们的工作有帮助,请引用以下文献:
@article{Qwen2-Audio,
title={Qwen2-Audio Technical Report},
author={Chu, Yunfei and Xu, Jin and Yang, Qian and Wei, Haojie and Wei, Xipin and Guo, Zhifang and Leng, Yichong and Lv, Yuanjun and He, Jinzheng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},
journal={arXiv preprint arXiv:2407.10759},
year={2024}
}
@article{Qwen-Audio,
title={Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models},
author={Chu, Yunfei and Xu, Jin and Zhou, Xiaohuan and Yang, Qian and Zhang, Shiliang and Yan, Zhijie and Zhou, Chang and Zhou, Jingren},
journal={arXiv preprint arXiv:2311.07919},
year={2023}
}