🚀 Speechless
Speechless is a compact, open - source text - to - semantics model that directly generates semantic representations of audio as discrete tokens, bypassing the need for a TTS model, simplifying training and saving resources, especially for low - resource languages.

🚀 Quick Start
You can use the given example code to load the model.
import torch
from transformers import pipeline
model_id = "homebrewltd/Speechless-llama3.2-v0.1"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
pipe("<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research")
>>> [{'generated_text': '<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research.assistant\n\n<|sound_1968|><|sound_0464|><|sound_0642|><|duration_02|><|sound_0634|><|sound_0105|><|duration_02|><|sound_1745|><|duration_02|><|sound_1345|><|sound_0210|><|sound_1312|><|sound_1312|>'}]
✨ Features
Speechless is a compact, open - source text - to - semantics (1B parameters) model. It is designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text - to - speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low - resource languages. Trained on over ~400 hours of English and ~1000 hours of Vietnamese data, it is a core component of the Ichigo v0.5 family.
📚 Documentation
Model Summary
Property |
Details |
Developed by |
Homebrew Research |
Model Architecture |
Llama |
Model Type |
Text to Semantics |
Language(s) |
English and Vietnamese |
License |
Apache 2.0 |
Resources
Intended Use
⚠️ Important Note
This model is primarily designed for research purposes. This version focuses on generating direct semantic representations of audio as discrete tokens, eliminating the need for a text - to - speech (TTS) model. The use of Ichigo Whisper in any manner that violates applicable laws or regulations is strictly prohibited.
🔧 Technical Details
Training Specs
Parameter |
Value |
Epochs |
2 |
Global Batch Size |
144 |
Learning Rate |
3e - 4 |
Learning Scheduler |
Cosine |
Optimizer |
AdamW |
Warmup Ratio |
0.05 |
Weight Decay |
0.01 |
Max Sequence Length |
512 |
Clip Grad Norm |
1.0 |
Evaluation
Vietnamese
Model Name |
Dataset test |
Test samples |
WER |
Speechless v0.1 |
viet_bud500 |
7500 |
3.99 |
English
Model Name |
Dataset test |
Test samples |
WER |
Speechless v0.1 |
librispeech_asr |
2620 |
3.27 |
📄 License
The model is licensed under Apache 2.0.
📖 Citation Information
BibTeX:
@article{Speechless 2024,
title={Speechless},
author={Homebrew Research},
year=2024,
month=December,
url={https://huggingface.co/homebrewltd/Speechless-llama3.2-v0.1}
👏 Acknowledgement