nb-whisper-large open-source speech recognition model - Supports multi-dialect Norwegian and English, available for free use

Nb Whisper Large

Developed by NbAiLab

An automatic Norwegian speech recognition model launched by the National Library of Norway, developed based on OpenAI's Whisper architecture, supporting multiple Norwegian dialects and English.

Speech Recognition

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Norwegian speech recognition #Multi-dialect support #Speaker separation

Downloads 5,214

Release Time : 2/13/2024

Model Overview

NB-Whisper Large is an automatic speech recognition (ASR) model optimized for Norwegian, based on OpenAI's Whisper architecture, supporting speech transcription and translation of Norwegian, Bokmål, Nynorsk, and English.

Model Features

Multilingual support

Supports speech recognition and translation of Norwegian, Bokmål, Nynorsk, and English

Multiple model sizes

Provides models of different sizes such as Tiny, Base, Small, Medium, and Large to meet the needs of different scenarios

Speaker separation

Supports identifying and separating different speakers in the audio to improve the quality of meeting or phone transcriptions

Timestamp output

Can output text with timestamps, including sentence-level and word-level timestamps

Model Capabilities

Speech to text

Multilingual speech recognition

Speaker separation

Timestamp annotation

Long audio processing

Use Cases

Speech transcription

Meeting minutes

Convert meeting recordings into written records and distinguish different speakers

Improve the efficiency of meeting minutes and facilitate subsequent review and analysis

Media subtitle generation

Automatically generate subtitles for radio and TV programs

Save the cost of manual subtitle production and improve accessibility

Speech translation

Norwegian-English translation

Translate Norwegian speech into English text in real-time

Promote cross-language communication

🚀 NB-Whisper Large

The Norwegian NB-Whisper Large model is proudly developed by the National Library of Norway. NB-Whisper is a state - of - the - art series of models for automatic speech recognition (ASR) and speech translation, based on OpenAI's Whisper. Each model in this series has been trained for 250,000 steps with a diverse dataset of 8 million samples, which are 30 - second aligned audio clips, totaling 66,000 hours of speech. Stay tuned for our upcoming article for in - depth details on our training methodology and dataset composition.

✨ Features

Model Variants

Main Models: A series of models with different sizes and parameter counts, suitable for various ASR and speech translation tasks. | Model Size | Parameters | Model | |------------|------------|------------| | Tiny | 39M | NB-Whisper Tiny | | Base | 74M | NB-Whisper Base | | Small | 244M | NB-Whisper Small | | Medium | 769M | NB-Whisper Medium | | Large | 1550M | NB-Whisper Large |
Verbatim Model: Trained for an additional 250 steps from the main models, more literal and suitable for tasks like linguistic analysis. | Model Size | Parameters | Semantic version | |------------|------------|------------------| | Tiny | 39M | Tiny - semantic | | Base | 74M | Base - semantic | | Small | 244M | Small - semantic | | Medium | 769M | Medium - semantic | | Large | 1550M | Large - semantic |

Model Description

Property	Details
Developed by	NB AI-Lab
Shared by	NB AI-Lab
Model Type	`whisper`
Language(s) (NLP)	Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English
License	Apache 2.0
Trained from model	openai/whisper-large-v3
Code Repository	https://github.com/NbAiLab/nb-whisper/
Paper	Coming soon
Demo	See Spaces on this page

📦 Installation

Local Setup with HuggingFace

If you want to run the models locally, follow these steps:

# Download the sample file
$ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3

# Install necessary libraries. 
$ pip install transformers>=4.35.2

Whisper CPP

# We can download and compile whisper.cpp
$ git clone --depth 1 https://github.com/ggerganov/whisper.cpp --branch v1.5.1
$ cd whisper.cpp/
$ make

# We also need to convert the audio to WAV as that is the only format supported by whisper.cpp
$ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
$ ffmpeg -i king.mp3 -ar 16000 -ac 1 -c:a pcm_s16le king.wav                                        

# Lets download the two ggml-files from this site
wget -N https://huggingface.co/NbAiLab/nb-whisper-large/resolve/main/ggml-model.bin -O models/nb-large-ggml-model.bin
wget -N https://huggingface.co/NbAiLab/nb-whisper-large/resolve/main/ggml-model-q5_0.bin -O models/nb-large-ggml-model-q5_0.bin

WhisperX and Speaker Diarization

# Follow the install instructions on https://github.com/m-bain/whisperX
# Make sure you have a HuggingFace account and have agreed to the pyannote terms

# Log in (or supply HF Token in command line)
huggingface-cli login

# Download a test file
wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/knuthamsun.mp3

# Optional. If you get complains about not support for Norwegian, do:
pip uninstall whisperx && pip install git+https://github.com/m-bain/whisperx.git@8540ff5985fceee764acbed94f656063d7f56540

💻 Usage Examples

Basic Usage

from transformers import pipeline

# Load the model
asr = pipeline("automatic-speech-recognition", "NbAiLabBeta/nb-whisper-large")

#transcribe
asr("king.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})

Advanced Usage

# Long Transcripts
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'no'})

# Increase accuracy by setting beam size to 5
asr("king.mp3", chunk_length_s=28, return_timestamps=True, generate_kwargs={'num_beams': 5, 'task': 'transcribe', 'language': 'no'})

# Return Timestamps
asr("king.mp3", chunk_length_s=28, return_timestamps=True, generate_kwargs={'task': 'transcribe', 'language': 'no'})

# Return Word Level Timestamps
asr("king.mp3", chunk_length_s=28, return_timestamps="word", generate_kwargs={'task': 'transcribe', 'language': 'no'})

# Transcribe to Nynorsk
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'nn'})

# Transcribe to English
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'en'})

Whisper CPP

# And run it with the f16 default model
$ ./main -l no -m models/nb-large-ggml-model.bin king.wav

# Or the quantized version
$ ./main -l no -m models/nb-large-ggml-model-q5_0.bin king.wav

WhisperX and Speaker Diarization

# Transcribe the test file. All transcripts will end up in the directory of the mp3-file
whisperx knuthamsun.mp3 --model NbAiLabBeta/nb-whisper-large --language no --diarize

📚 Documentation

Online Demos

You can try the models directly through the HuggingFace Inference API, accessible on the right side of this page. Note that initially, the model needs to load and will run on limited CPU capacity, which might be slow. To enhance your experience, we are temporarily hosting some models on TPUs for a few days, significantly boosting their performance. Explore these under the Spaces section on the Main Page.

API

Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks.

🔧 Technical Details

Training Data

The training data comes from Språkbanken and the National Library of Norway's digital collection, including:

NST Norwegian ASR Database (16 kHz) and its corresponding dataset
Transcribed speeches from the Norwegian Parliament by Språkbanken
TV broadcast (NRK) subtitles (NLN digital collection)
Audiobooks (NLN digital collection)

Downstream Use

The models, especially the smaller ones, may have occasional hallucinations and may drop parts of the transcript. They are designed to convert spoken language into grammatically correct written sentences, which might not always be word - for - word translations. We have made two extra model variants for users who want a different transcription style.

Software

The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under Files and versions. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository nb-whisper.

📄 License

This model is released under the Apache 2.0 license. Note that for downloads made in Norway, the requirements for attribution specified in the Norwegian copyright act still apply where relevant, even if not explicitly mentioned in the Apache License. Although attribution might not be required if the model is downloaded and used in other countries.

Citation & Contributors

The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold (@pere) at the National Library of Norway. Key contributors include Javier de la Rosa (@versae), Freddy Wetjen (@freddyw), and Rolv-Arild Braaten (@Rolv-Arild). NB AI-Lab, under the direction of Svein Arne Brygfjeld (@Brygfjeld), supported the project's successful completion. A detailed paper on our process and findings is forthcoming.

Disclaimer

The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions. When third parties deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of artificial intelligence. In no event shall the owner of the models (The National Library of Norway) be liable for any results arising from the use made by third parties of these models.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご