Open-source Text-to-Speech Model of tts-v1-finetuned - Danish Speech Synthesis for Natural Conversation Scenarios

Home

Tts V1 Finetuned

Developed by syvai

Text-to-speech model trained on 1000+ hours of Danish data, supporting natural conversation scenario speech synthesis

Speech Synthesis

Transformers

#Danish TTS #LLAMA Architecture #Natural Conversation Synthesis

Downloads 84

Release Time : 4/25/2025

Model Overview

The first open-source text-to-speech model optimized for Danish, built on LLAMA 3.2 3B architecture, deployable via mainstream inference frameworks

Model Features

Danish Optimization

Fine-tuned specifically for Danish, supporting natural conversation scenario speech synthesis

LLM Architecture Compatibility

Utilizes LLAMA architecture, deployable via mainstream inference frameworks like vLLM and ollama

Long Sequence Processing

Supports sequence processing up to 8192 in length, suitable for long text speech synthesis

Efficient Training Configuration

Employs optimization techniques like flash attention and gradient checkpointing to enhance training efficiency

Model Capabilities

Danish text-to-speech

Long text speech synthesis

Natural conversation style speech generation

Use Cases

Voice Interaction Systems

Danish Voice Assistant

Develop voice interaction assistants for Danish-speaking regions

Accessibility Services

Text-to-Speech Service

Provide Danish content reading services for visually impaired individuals

🚀 syv.ai TTS v0.1

syv.ai TTS v0.1 is our first open - source text - to - speech model. It is trained on over 1000 hours of Danish audio, offering high - quality text - to - speech conversion.

🚀 Quick Start

If you want to run inference on this model, since it's an LLM, you can use popular inference frameworks like vLLM, ollama, etc. We recommend referring to how inference is implemented in Orpheus.

✨ Features

Model Details

The model is originally a LLAMA 3.2 3B model, which was first trained on 100,000 hours of English audio and then fine - tuned to speak Danish.

Seeking More Audio

We are looking for more audio data, especially normal conversation audio. If you have relevant audio (preferably not read - aloud), please contact us.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model

The model, initially a LLAMA 3.2 3B model, has undergone two - stage training. First, it was trained on a large amount of English audio, and then fine - tuned for Danish. As an LLM, it supports inference using vLLM, ollama, or other popular inference frameworks.

Training Configuration

The model is trained using axolotl version 0.8.0 with the following configuration:

base_model: syvai/tts-v1-pretrained
# Automatically upload checkpoint and final model to HF
hub_model_id: syvai/tts-v1-finetuned

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

datasets:
  - path: syvai/zac-coral-tts
    type: 
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
eval_sample_packing: False
output_dir: ./outputs/finetuned

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

wandb_project: orph
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 3
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 3
evals_per_epoch: 5
saves_per_epoch: 5
weight_decay: 0.05

special_tokens:
  pad_token: <custom_token_7>

tts - v1 - finetuned

This model is a fine - tuned version of [syvai/tts - v1 - pretrained](https://huggingface.co/syvai/tts - v1 - pretrained) on the syvai/zac - coral - tts dataset. It achieves a loss of 4.2860 on the evaluation set.

Model Description

More information is needed.

Intended Uses & Limitations

More information is needed.

Training and Evaluation Data

More information is needed.

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e - 05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 3
num_epochs: 3.0

Training Results

Training Loss	Epoch	Step	Validation Loss
4.9492	0.0246	1	4.8478
4.7181	0.1969	8	4.5872
4.5871	0.3938	16	4.4631
4.557	0.5908	24	4.3972
4.4965	0.7877	32	4.3521
4.4697	0.9846	40	4.3258
4.4525	1.1723	48	4.3083
4.4301	1.3692	56	4.2980
4.4459	1.5662	64	4.2915
4.4382	1.7631	72	4.2893
4.4315	1.96	80	4.2866
4.4178	2.1477	88	4.2861
4.4501	2.3446	96	4.2859
4.4121	2.5415	104	4.2856
4.4164	2.7385	112	4.2859
4.4264	2.9354	120	4.2860

Framework Versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

📄 License

The model follows the MIT license for individuals and organizations using it for research. For commercial use, a one - time fee of 1 kr is required for a lifetime license. Read LICENSE.txt for the full license details.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご