whisper-large-v3-narrow-accent開源模型 - 精準識別16種英語口音

首頁

Whisper Large V3 Narrow Accent

由tiantiaf開發

基於Whisper-Large v3的細粒度口音分類模型，支持16種英語口音識別

音頻分類

Safetensors

英語開源協議:Bsd-3-clause #多口音分類 #語音特徵提取 #短音頻優化

下載量 237

發布時間 : 5/22/2025

模型概述

該模型實現了細粒度口音分類方法，能夠識別16種不同的英語口音類型，適用於語音特徵分析和說話人分類任務。

模型特點

細粒度口音分類

能夠識別16種不同的英語口音類型，包括東亞、英格蘭、日耳曼語系等多種口音

合成語音識別特性

對合成語音(TTS)樣本有特殊識別模式，更容易識別為日耳曼語系

基於Whisper-Large v3

建立在強大的Whisper-Large v3基礎模型上，繼承了其優秀的語音處理能力

模型能力

英語口音分類

語音特徵提取

說話人特徵分析

使用案例

語音分析

說話人口音識別

識別說話人的英語口音類型

可準確分類16種不同英語口音

語音特徵分析

提取語音中的口音特徵

可用於說話人特徵分析

語音技術研究

語音模型基準測試

作為語音基礎模型基準測試的一部分

提供標準化的口音分類評估

🚀 Whisper-Large v3 用於窄口音分類

本模型用於實現窄口音分類，可有效解決對不同英語口音進行精準分類的問題，為語音識別和分析領域提供了有力支持。

🚀 快速開始

下載倉庫

git clone git@github.com:tiantiaf0627/vox-profile-release.git

安裝包

conda create -n vox_profile python=3.8
cd vox-profile-release
pip install -e .

加載模型

# Load libraries
import torch
import torch.nn.functional as F
from src.model.accent.whisper_accent import WhisperWrapper

# Find device
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

# Load model from Huggingface
model = WhisperWrapper.from_pretrained("tiantiaf/whisper-large-v3-narrow-accent").to(device)
model.eval()

預測

# Label List
english_accent_list = [
    'East Asia', 'English', 'Germanic', 'Irish', 
    'North America', 'Northern Irish', 'Oceania', 
    'Other', 'Romance', 'Scottish', 'Semitic', 'Slavic', 
    'South African', 'Southeast Asia', 'South Asia', 'Welsh'
]
    
# Load data, here just zeros as the example
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
max_audio_length = 15 * 16000
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
logits, embeddings = model(data, return_feature=True)
    
# Probability and output
accent_prob = F.softmax(logits, dim=1)
print(english_accent_list[torch.argmax(accent_prob).detach().cpu().item()])

✨ 主要特性

本模型包含了在《Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits》(https://arxiv.org/pdf/2505.14648) 中描述的窄口音分類的實現。包含的英語口音有：

[
  'East Asia', 'English', 'Germanic', 'Irish', 
  'North America', 'Northern Irish', 'Oceania', 
  'Other', 'Romance', 'Scottish', 'Semitic', 'Slavic', 
  'South African', 'Southeast Asia', 'South Asia', 'Welsh'
]

模型的一些觀察結果（隨著觀察增加會持續補充）：

文本轉語音（TTS）樣本更傾向於被識別為日耳曼口音。

相關庫：https://github.com/tiantiaf0627/vox-profile-release

📚 詳細文檔

引用信息

如果您使用了我們的模型或發現它在您的工作中很有用，請引用我們的論文：

@article{feng2025vox,
  title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
  author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
  journal={arXiv preprint arXiv:2505.14648},
  year={2025}
}

📄 許可證

本模型採用 BSD 3 條款許可證（bsd-3-clause）。

📦 模型信息

屬性	詳情
模型類型	音頻分類
基礎模型	openai/whisper-large-v3
訓練數據集	mozilla-foundation/common_voice_11_0
評估指標	準確率
語言	英語
標籤	model_hub_mixin, pytorch_model_hub_mixin, speaker_accent_classification