S

Speechless Llama3.2 V0.1

Developed by Menlo
Speechless is a compact open-source text-to-semantic model (1 billion parameters) designed to directly convert audio into discrete semantic tokens without relying on traditional text-to-speech (TTS) models.
Downloads 39
Release Time : 12/28/2024

Model Overview

Speechless eliminates the complexity of traditional TTS→ASR pipelines by directly converting text into semantic speech tokens, simplifying the training process, saving resources, and achieving scalability, especially for resource-scarce languages.

Model Features

Direct Audio-to-Semantic Tokenization
Converts audio directly into discrete semantic tokens without relying on traditional TTS models.
Multilingual Support
Supports English and Vietnamese, particularly suitable for resource-scarce languages.
Efficient Training
Simplifies the training process and saves computational resources.

Model Capabilities

Audio-to-Semantic Tokenization
Multilingual Processing
Efficient Resource Utilization

Use Cases

Speech Processing
Speech-to-Semantic Tokenization
Converts audio directly into semantic tokens for subsequent processing or analysis.
Word error rates as low as 3.27 (English) and 3.99 (Vietnamese).
Research
Speech Model Research
Used to study new methods for direct audio-to-semantic tokenization.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase