Open-source model of vits_icelandic_rosa_female_monospeaker

Vits Icelandic Rosa Female Monospeaker

Developed by Sigurdur

This is an Icelandic text-to-speech model fine-tuned based on facebook/mms-tts-isl, trained using the Talrómur dataset, specializing in female voice synthesis.

Speech Synthesis

Transformers

Other#Icelandic TTS #Single-speaker speech synthesis #VITS architecture

Downloads 22

Release Time : 1/20/2025

Model Overview

This model is used to convert Icelandic text into natural speech, particularly suitable for applications requiring a female voice.

Model Features

High-quality speech synthesis

Based on the VITS architecture, capable of generating natural and fluent Icelandic speech

Female single-speaker

Specializes in synthesizing the Icelandic female voice named Rósa

Fine-tuned based on MMS-TTS

Fine-tuned on the facebook/mms-tts-isl base model, inheriting its excellent speech synthesis capabilities

Model Capabilities

Icelandic text-to-speech

Female voice synthesis

16kHz audio output

Use Cases

Voice assistants

Icelandic voice assistant

Provides natural speech output for Icelandic voice assistants

Audiobooks

Icelandic audio content production

Converts Icelandic text into audio content

🚀 Model Card for Sigurdur/vits_icelandic_rosa_female_monospeaker

This is a text-to-speech model for Icelandic. It is fine-tuned from facebook/mms-tts-isl using the Talrómur dataset, offering a practical solution for Icelandic text-to-speech applications.

🚀 Quick Start

This model is designed for text-to-speech applications in Icelandic. Here's a basic example of how to use it:

from transformers import VitsModel, AutoTokenizer
import scipy.io.wavfile as wav
import torch

model = VitsModel.from_pretrained("Sigurdur/vits_icelandic_rosa_female_monospeaker")
tokenizer = AutoTokenizer.from_pretrained("Sigurdur/vits_icelandic_rosa_female_monospeaker")

text = "Góðan daginn! Ég heiti Rósa, ég er talgervill"

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
  output = model(**inputs).waveform

sampling_rate = getattr(sampling_rate, "sampling_rate", 16000)  # Default to 16kHz if not set
if not (0 <= sampling_rate <= 65535):
    raise ValueError(f"Invalid sampling rate: {sampling_rate}")

waveform = output.squeeze().cpu().numpy()  # Remove batch dimension if present

Save Output to File

wav.write("output.wav", rate=sampling_rate, data=waveform)

View in Jupyter Notebook

from IPython.display import Audio

# show audio player for "output.wav"
Audio(output, rate=sampling_rate)

✨ Features

Icelandic Support: Specifically fine-tuned for Icelandic text-to-speech, providing high-quality voice output for the Icelandic language.
Based on VITS: Built on the VITS architecture, ensuring efficient and accurate speech synthesis.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import VitsModel, AutoTokenizer
import scipy.io.wavfile as wav
import torch

model = VitsModel.from_pretrained("Sigurdur/vits_icelandic_rosa_female_monospeaker")
tokenizer = AutoTokenizer.from_pretrained("Sigurdur/vits_icelandic_rosa_female_monospeaker")

text = "Góðan daginn! Ég heiti Rósa, ég er talgervill"

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
  output = model(**inputs).waveform

sampling_rate = getattr(sampling_rate, "sampling_rate", 16000)  # Default to 16kHz if not set
if not (0 <= sampling_rate <= 65535):
    raise ValueError(f"Invalid sampling rate: {sampling_rate}")

waveform = output.squeeze().cpu().numpy()  # Remove batch dimension if present

Advanced Usage

The basic usage example covers most common scenarios. There is no additional advanced usage information provided in the original document.

📚 Documentation

Model Details

Developed by: Sigurdur Haukur Birgisson
Model type: VITS
Language(s) (NLP): Icelandic, isl
License: [More Information Needed]
Finetuned from model: facebook/mms-tts-isl

Uses

This model should be used for text-to-speech applications for Icelandic.

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information is needed for further recommendations.

Training Data

[More Information Needed]

Training Hyperparameters

Training regime: fp16

Evaluation

[More Information Needed]

Model Examination

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

[More Information Needed]

Citation

[More Information Needed]

Glossary

[More Information Needed]

More Information

[More Information Needed]

📄 License

[More Information Needed]

Model Card Authors

Sigurdur Haukur Birgisson

Model Card Contact

Feel free to contact me through Linkedin: Sigurdur Haukur Birgisson

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご