Whisper-large-v3-narrow-accent open-source model - Accurately identify 16 English accents

Whisper Large V3 Narrow Accent

Developed by tiantiaf

A fine-grained accent classification model based on Whisper-Large v3, supporting recognition of 16 English accents

EnglishOpen Source License:Bsd-3-clause #Multi-accent classification #Speech feature extraction #Short audio optimization

Downloads 237

Release Time : 5/22/2025

Model Overview

This model implements a fine-grained accent classification method capable of identifying 16 different English accent types, suitable for speech feature analysis and speaker classification tasks.

Model Features

Fine-grained accent classification

Capable of recognizing 16 different English accent types, including East Asian, English, Germanic, and others

Synthetic speech recognition feature

Special recognition pattern for TTS samples, more easily identified as Germanic language family

Based on Whisper-Large v3

Built upon the powerful Whisper-Large v3 foundation model, inheriting its excellent speech processing capabilities

Model Capabilities

English accent classification

Speech feature extraction

Speaker feature analysis

Use Cases

Speech analysis

Speaker accent recognition

Identify the English accent type of a speaker

Can accurately classify 16 different English accents

Speech feature analysis

Extract accent features from speech

Can be used for speaker feature analysis

Speech technology research

Speech model benchmarking

As part of speech foundation model benchmarking

Provides standardized accent classification evaluation

🚀 Whisper-Large v3 for Narrow Accent Classification

This model is designed for narrow accent classification, offering a solution to accurately identify diverse English accents.

🚀 Quick Start

Download repo

git clone git@github.com:tiantiaf0627/vox-profile-release.git

Install the package

conda create -n vox_profile python=3.8
cd vox-profile-release
pip install -e .

Load the model

# Load libraries
import torch
import torch.nn.functional as F
from src.model.accent.whisper_accent import WhisperWrapper

# Find device
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

# Load model from Huggingface
model = WhisperWrapper.from_pretrained("tiantiaf/whisper-large-v3-narrow-accent").to(device)
model.eval()

Prediction

# Label List
english_accent_list = [
    'East Asia', 'English', 'Germanic', 'Irish', 
    'North America', 'Northern Irish', 'Oceania', 
    'Other', 'Romance', 'Scottish', 'Semitic', 'Slavic', 
    'South African', 'Southeast Asia', 'South Asia', 'Welsh'
]
    
# Load data, here just zeros as the example
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
max_audio_length = 15 * 16000
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
logits, embeddings = model(data, return_feature=True)
    
# Probability and output
accent_prob = F.softmax(logits, dim=1)
print(english_accent_list[torch.argmax(accent_prob).detach().cpu().item()])

✨ Features

Accent Classification: This model can classify narrow English accents, including 'East Asia', 'English', 'Germanic', etc.
Based on Whisper: Built on the openai/whisper-large-v3 base model.
Benchmark Support: Implements the narrow accent classification described in Vox - Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits (https://arxiv.org/pdf/2505.14648).

📦 Installation

Download repo

git clone git@github.com:tiantiaf0627/vox-profile-release.git

Install the package

conda create -n vox_profile python=3.8
cd vox-profile-release
pip install -e .

💻 Usage Examples

Basic Usage

# Load libraries
import torch
import torch.nn.functional as F
from src.model.accent.whisper_accent import WhisperWrapper

# Find device
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

# Load model from Huggingface
model = WhisperWrapper.from_pretrained("tiantiaf/whisper-large-v3-narrow-accent").to(device)
model.eval()

# Label List
english_accent_list = [
    'East Asia', 'English', 'Germanic', 'Irish', 
    'North America', 'Northern Irish', 'Oceania', 
    'Other', 'Romance', 'Scottish', 'Semitic', 'Slavic', 
    'South African', 'Southeast Asia', 'South Asia', 'Welsh'
]
    
# Load data, here just zeros as the example
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
max_audio_length = 15 * 16000
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
logits, embeddings = model(data, return_feature=True)
    
# Probability and output
accent_prob = F.softmax(logits, dim=1)
print(english_accent_list[torch.argmax(accent_prob).detach().cpu().item()])

📚 Documentation

Model Description

This model includes the implementation of narrow accent classification described in Vox - Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits (https://arxiv.org/pdf/2505.14648).

The included English accents are:

[
  'East Asia', 'English', 'Germanic', 'Irish', 
  'North America', 'Northern Irish', 'Oceania', 
  'Other', 'Romance', 'Scottish', 'Semitic', 'Slavic', 
  'South African', 'Southeast Asia', 'South Asia', 'Welsh'
]

Some observations we have seen for this model (List to add as we observe more):

TTS samples have a higher tendency to be recognized as Germanic.

Library: https://github.com/tiantiaf0627/vox-profile-release

📄 License

This model is released under the bsd - 3 - clause license.

📚 Citation

@article{feng2025vox,
  title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
  author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
  journal={arXiv preprint arXiv:2505.14648},
  year={2025}
}

📦 Model Information

Property	Details
Model Type	audio - classification
Base Model	openai/whisper - large - v3
Datasets	mozilla - foundation/common_voice_11_0
Metrics	accuracy

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご