🚀 NB-Whisper Large
The Norwegian NB-Whisper Large model is proudly developed by the National Library of Norway. NB-Whisper is a state - of - the - art series of models for automatic speech recognition (ASR) and speech translation, based on OpenAI's Whisper. Each model in this series has been trained for 250,000 steps with a diverse dataset of 8 million samples, which are 30 - second aligned audio clips, totaling 66,000 hours of speech. Stay tuned for our upcoming article for in - depth details on our training methodology and dataset composition.
✨ Features
Model Variants
-
Main Models: A series of models with different sizes and parameter counts, suitable for various ASR and speech translation tasks.
| Model Size | Parameters | Model |
|------------|------------|------------|
| Tiny | 39M | NB-Whisper Tiny |
| Base | 74M | NB-Whisper Base |
| Small | 244M | NB-Whisper Small |
| Medium | 769M | NB-Whisper Medium |
| Large | 1550M | NB-Whisper Large |
-
Verbatim Model: Trained for an additional 250 steps from the main models, more literal and suitable for tasks like linguistic analysis.
| Model Size | Parameters | Semantic version |
|------------|------------|------------------|
| Tiny | 39M | Tiny - semantic |
| Base | 74M | Base - semantic |
| Small | 244M | Small - semantic |
| Medium | 769M | Medium - semantic |
| Large | 1550M | Large - semantic |
Model Description
Property |
Details |
Developed by |
NB AI-Lab |
Shared by |
NB AI-Lab |
Model Type |
whisper |
Language(s) (NLP) |
Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English |
License |
Apache 2.0 |
Trained from model |
openai/whisper-large-v3 |
Code Repository |
https://github.com/NbAiLab/nb-whisper/ |
Paper |
Coming soon |
Demo |
See Spaces on this page |
📦 Installation
Local Setup with HuggingFace
If you want to run the models locally, follow these steps:
$ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
$ pip install transformers>=4.35.2
Whisper CPP
$ git clone --depth 1 https://github.com/ggerganov/whisper.cpp --branch v1.5.1
$ cd whisper.cpp/
$ make
$ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
$ ffmpeg -i king.mp3 -ar 16000 -ac 1 -c:a pcm_s16le king.wav
wget -N https://huggingface.co/NbAiLab/nb-whisper-large/resolve/main/ggml-model.bin -O models/nb-large-ggml-model.bin
wget -N https://huggingface.co/NbAiLab/nb-whisper-large/resolve/main/ggml-model-q5_0.bin -O models/nb-large-ggml-model-q5_0.bin
WhisperX and Speaker Diarization
huggingface-cli login
wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/knuthamsun.mp3
pip uninstall whisperx && pip install git+https://github.com/m-bain/whisperx.git@8540ff5985fceee764acbed94f656063d7f56540
💻 Usage Examples
Basic Usage
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", "NbAiLabBeta/nb-whisper-large")
asr("king.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})
Advanced Usage
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'no'})
asr("king.mp3", chunk_length_s=28, return_timestamps=True, generate_kwargs={'num_beams': 5, 'task': 'transcribe', 'language': 'no'})
asr("king.mp3", chunk_length_s=28, return_timestamps=True, generate_kwargs={'task': 'transcribe', 'language': 'no'})
asr("king.mp3", chunk_length_s=28, return_timestamps="word", generate_kwargs={'task': 'transcribe', 'language': 'no'})
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'nn'})
asr("king.mp3", chunk_length_s=28, generate_kwargs={'task': 'transcribe', 'language': 'en'})
Whisper CPP
$ ./main -l no -m models/nb-large-ggml-model.bin king.wav
$ ./main -l no -m models/nb-large-ggml-model-q5_0.bin king.wav
WhisperX and Speaker Diarization
whisperx knuthamsun.mp3 --model NbAiLabBeta/nb-whisper-large --language no --diarize
📚 Documentation
Online Demos
You can try the models directly through the HuggingFace Inference API, accessible on the right side of this page. Note that initially, the model needs to load and will run on limited CPU capacity, which might be slow. To enhance your experience, we are temporarily hosting some models on TPUs for a few days, significantly boosting their performance. Explore these under the Spaces section on the Main Page.
API
Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks.
🔧 Technical Details
Training Data
The training data comes from Språkbanken and the National Library of Norway's digital collection, including:
- NST Norwegian ASR Database (16 kHz) and its corresponding dataset
- Transcribed speeches from the Norwegian Parliament by Språkbanken
- TV broadcast (NRK) subtitles (NLN digital collection)
- Audiobooks (NLN digital collection)
Downstream Use
The models, especially the smaller ones, may have occasional hallucinations and may drop parts of the transcript. They are designed to convert spoken language into grammatically correct written sentences, which might not always be word - for - word translations. We have made two extra model variants for users who want a different transcription style.
Software
The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under Files and versions
. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository nb-whisper.
📄 License
This model is released under the Apache 2.0 license. Note that for downloads made in Norway, the requirements for attribution specified in the Norwegian copyright act still apply where relevant, even if not explicitly mentioned in the Apache License. Although attribution might not be required if the model is downloaded and used in other countries.
Citation & Contributors
The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold (@pere) at the National Library of Norway. Key contributors include Javier de la Rosa (@versae), Freddy Wetjen (@freddyw), and Rolv-Arild Braaten (@Rolv-Arild). NB AI-Lab, under the direction of Svein Arne Brygfjeld (@Brygfjeld), supported the project's successful completion. A detailed paper on our process and findings is forthcoming.
Disclaimer
The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions. When third parties deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of artificial intelligence. In no event shall the owner of the models (The National Library of Norway) be liable for any results arising from the use made by third parties of these models.