Stable-codec-speech-16k Open-source Voice Coding and Decoding Model - Efficient Compression and Generation of Voice Data

Stable Codec Speech 16k

Developed by stabilityai

High-quality low-bitrate speech codec model based on Transformer architecture, specifically designed for speech data compression and generative modeling

Audio Generation

Safetensors

EnglishOpen Source License:Other #Low-bitrate speech coding #Transformer codec #Fundamentals of speech synthesis

Downloads 1,072

Release Time : 1/10/2025

Model Overview

This model processes audio waveforms by encoding them into discrete tokens, enabling efficient compression of speech signals and decoding to restore original audio, serving as a foundational tool for speech generation and understanding applications

Model Features

High-quality low-bitrate encoding

Compression technology optimized for speech data, achieving low bitrates while maintaining high quality

Generative modeling friendly

Output format is particularly suitable as input or training target for speech generation models

Commercial-friendly license

Free for commercial use by organizations with annual revenue under $1 million

Model Capabilities

Speech signal compression

Audio stream transmission optimization

Speech coding research

Fundamental tool for speech synthesis

Use Cases

Communication enhancement

Real-time communication platforms

Optimizing data transmission efficiency for voice calls

Reduced bandwidth requirements while maintaining speech quality

Speech technology development

Text-to-speech systems

Serving as pre-processing/post-processing component for speech generation models

Conversational AI

Supporting development of voice interaction systems

🚀 stable-codec-speech-16k Model Card

stable-codec-speech-16k is a Transformer-based codec model. It's designed for high-quality, low-bitrate audio coding. The model processes audio waveforms by encoding them into discrete tokens and can later decode these tokens back into the original audio waveform.

Please note that for individuals or organizations with an annual revenue of US $1,000,000 (or local currency equivalent) or more, regardless of the revenue source, you must obtain an enterprise commercial license directly from Stability AI before commercially using Stable Codec, any derivative work of it (such as a “fine tune” model), or their outputs. You can submit a request for an Enterprise License at https://stability.ai/enterprise. Refer to Stability AI's Community License at https://stability.ai/license for more information.

arch

✨ Features

High - quality and Low - bitrate Coding: Capable of encoding audio into discrete tokens for high - quality, low - bitrate audio coding.
Foundation for Downstream Applications: Serves as a foundational tool for developing downstream applications in speech understanding and generation.

📚 Documentation

Model Description

Developed by: Stability AI
Model type: Transformer audio codec model
Model details: The released model is a speech codec that compresses real - world speech data into a suitable format for generative modeling. It provides a basis for developing downstream applications in speech understanding and generation, like text - to - speech systems and conversational AI models. Check our arXiv page and Github repo for more details.

License

Community License: Free for research, non - commercial, and commercial use by organizations and individuals with an annual revenue of less than US $1,000,000 (or local currency equivalent). If your annual revenue exceeds US $1M, any commercial use of this model or its derivative works requires obtaining an Enterprise License directly from Stability AI. You can submit a request for an Enterprise License at https://stability.ai/enterprise. Refer to Stability AI's Community License at https://stability.ai/license for more information.

Model Sources

Repository: https://github.com/Stability-AI/stable-codec
Audio demos: https://stability-ai.github.io/stable-codec-demo/
arXiv page: https://arxiv.org/abs/2411.19842

Training Dataset

The model was trained on datasets from creative commons or public domain audiobook recordings. See academic paper for more details.

Intended Uses

Efficient compression of speech signals for storage or streaming.
Enhancing speech - based applications, such as telecommunication systems and real - time communication platforms.
Research and development in audio coding and speech synthesis, including understanding and improving codec performance.
Development of downstream applications including speech recognition and generation.

All uses of the model should comply with our Acceptable Use Policy.

Out-of-Scope Uses

This model is trained on non - overlapping clean English speech and performs best in such scenarios. It's not suitable for applications requiring high - fidelity music or environmental sound coding.

Contact

Report any issues with the model or contact us:

Safety issues: safety@stability.ai
Security issues: security@stability.ai
Privacy issues: privacy@stability.ai
License and general: https://stability.ai/license
Enterprise license: https://stability.ai/enterprise

🚀 Quick Start

For usage instructions, please refer to our GitHub repository

📄 License

The model is under the stabilityai-ai-community license. By using this model, you agree to the License Agreement and acknowledge Stability AI's Privacy Policy.

⚠️ Important Note

For individuals or organizations generating annual revenue of US $1,000,000 (or local currency equivalent) or more, you must obtain an enterprise commercial license directly from Stability AI before commercial use.

Property	Details
Model Type	Transformer audio codec model
Training Data	Datasets derived from creative commons or public domain audiobook recordings

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご