🚀 syv.ai TTS v0.1
syv.ai TTS v0.1 is our first open - source text - to - speech model. It is trained on over 1000 hours of Danish audio, offering high - quality text - to - speech conversion.
🚀 Quick Start
If you want to run inference on this model, since it's an LLM, you can use popular inference frameworks like vLLM, ollama, etc. We recommend referring to how inference is implemented in Orpheus.
✨ Features
Model Details
The model is originally a LLAMA 3.2 3B model, which was first trained on 100,000 hours of English audio and then fine - tuned to speak Danish.
Seeking More Audio
We are looking for more audio data, especially normal conversation audio. If you have relevant audio (preferably not read - aloud), please contact us.
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Model
The model, initially a LLAMA 3.2 3B model, has undergone two - stage training. First, it was trained on a large amount of English audio, and then fine - tuned for Danish. As an LLM, it supports inference using vLLM, ollama, or other popular inference frameworks.
Training Configuration
The model is trained using axolotl version 0.8.0
with the following configuration:
base_model: syvai/tts-v1-pretrained
hub_model_id: syvai/tts-v1-finetuned
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true
datasets:
- path: syvai/zac-coral-tts
type:
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
eval_sample_packing: False
output_dir: ./outputs/finetuned
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
wandb_project: orph
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 3
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_steps: 3
evals_per_epoch: 5
saves_per_epoch: 5
weight_decay: 0.05
special_tokens:
pad_token: <custom_token_7>
tts - v1 - finetuned
This model is a fine - tuned version of [syvai/tts - v1 - pretrained](https://huggingface.co/syvai/tts - v1 - pretrained) on the syvai/zac - coral - tts dataset. It achieves a loss of 4.2860 on the evaluation set.
Model Description
More information is needed.
Intended Uses & Limitations
More information is needed.
Training and Evaluation Data
More information is needed.
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e - 05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 3
- num_epochs: 3.0
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
4.9492 |
0.0246 |
1 |
4.8478 |
4.7181 |
0.1969 |
8 |
4.5872 |
4.5871 |
0.3938 |
16 |
4.4631 |
4.557 |
0.5908 |
24 |
4.3972 |
4.4965 |
0.7877 |
32 |
4.3521 |
4.4697 |
0.9846 |
40 |
4.3258 |
4.4525 |
1.1723 |
48 |
4.3083 |
4.4301 |
1.3692 |
56 |
4.2980 |
4.4459 |
1.5662 |
64 |
4.2915 |
4.4382 |
1.7631 |
72 |
4.2893 |
4.4315 |
1.96 |
80 |
4.2866 |
4.4178 |
2.1477 |
88 |
4.2861 |
4.4501 |
2.3446 |
96 |
4.2859 |
4.4121 |
2.5415 |
104 |
4.2856 |
4.4164 |
2.7385 |
112 |
4.2859 |
4.4264 |
2.9354 |
120 |
4.2860 |
Framework Versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
📄 License
The model follows the MIT license for individuals and organizations using it for research. For commercial use, a one - time fee of 1 kr is required for a lifetime license. Read LICENSE.txt for the full license details.