wav2vec2-base-da-ft-nst Open-Source Danish Speech Recognition Model - Supports Accurate Recognition of 16kHz Audio

Home

Wav2vec2 Base Da Ft Nst

Developed by Alvenir

Danish speech recognition model fine-tuned on the NST dataset, supporting 16kHz sampled audio input

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Danish speech recognition #NST dataset optimization #16kHz sampling rate

Downloads 15

Release Time : 3/15/2022

Model Overview

This is a Danish wav2vec2 model fine-tuned by Alvenir on the public NST dataset, specifically designed for Danish speech-to-text tasks.

Model Features

Danish language optimization

Specially fine-tuned for Danish, excelling in Danish speech recognition tasks

16kHz sampling rate support

Supports 16kHz sampled audio input, ensuring compatibility with common speech data

Public dataset training

Trained on the public NST dataset, offering good transparency and reproducibility

Model Capabilities

Danish speech recognition

Speech-to-text

Use Cases

Speech transcription

Danish meeting minutes

Automatically convert Danish meeting recordings into text transcripts

Achieved 15.8% WER on the NST test set

Danish customer service call transcription

Automatically transcribe Danish customer service call contents

Achieved 19.0% WER on the alvenir-asr-da evaluation set

🚀 wav2vec2-base-da-ft-nst

This is a Danish Automatic Speech Recognition (ASR) model based on alvenir wav2vec2 model, fine - tuned by Alvenir on the public NST dataset. It offers high - quality speech - to - text conversion for Danish.

🚀 Quick Start

This model is trained on 16kHz audio data. Ensure that your input data has the same sample rate. It was initially trained using fairseq and then converted to the huggingface/transformers format.

Alvenir is always willing to assist with your open - source ASR projects, customized domain specializations, or premium models. ;-)

✨ Features

Danish ASR: Specifically fine - tuned for Danish language speech - to - text tasks.
Sample Rate Requirement: Trained on 16kHz audio, ensuring compatibility with similar - sampled data.
Format Compatibility: Converted to the huggingface/transformers format for easy integration.

📦 Installation

No specific installation steps are provided in the original README. If you want to use this model, you need to install relevant Python libraries such as transformers, soundfile, and torch. You can use the following command to install the transformers library:

pip install transformers

You may also need to install soundfile and torch according to your environment:

pip install soundfile torch

💻 Usage Examples

Basic Usage

import soundfile as sf
import torch

from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2Tokenizer, Wav2Vec2Processor, \
    Wav2Vec2ForCTC


def get_tokenizer(model_path: str) -> Wav2Vec2CTCTokenizer:
    return Wav2Vec2Tokenizer.from_pretrained(model_path)


def get_processor(model_path: str) -> Wav2Vec2Processor:
    return Wav2Vec2Processor.from_pretrained(model_path)


def load_model(model_path: str) -> Wav2Vec2ForCTC:
    return Wav2Vec2ForCTC.from_pretrained(model_path)


model_id = "Alvenir/wav2vec2-base-da-ft-nst"

model = load_model(model_id)
model.eval()
tokenizer = get_tokenizer(model_id)
processor = get_processor(model_id)

audio_file = "<path/to/audio.wav>"

audio, _ = sf.read(audio_file)

input_values = processor(audio, return_tensors="pt", padding="longest", sampling_rate=16_000).input_values
with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
print(transcription)

📚 Documentation

Benchmark results

Here are some benchmark results on publicly available Danish datasets.

Dataset	WER Greedy	WER with 3 - gram Language Model
NST test	15.8%	11.9%
alvenir - asr - da - eval	19.0%	12.1%
common_voice_80 da test	26.3%	19.2%

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご