🚀 stable-codec-speech-16k Model Card
stable-codec-speech-16k
is a Transformer-based codec model. It's designed for high-quality, low-bitrate audio coding. The model processes audio waveforms by encoding them into discrete tokens and can later decode these tokens back into the original audio waveform.
Please note that for individuals or organizations with an annual revenue of US $1,000,000 (or local currency equivalent) or more, regardless of the revenue source, you must obtain an enterprise commercial license directly from Stability AI before commercially using Stable Codec, any derivative work of it (such as a “fine tune” model), or their outputs. You can submit a request for an Enterprise License at https://stability.ai/enterprise. Refer to Stability AI's Community License at https://stability.ai/license for more information.

✨ Features
- High - quality and Low - bitrate Coding: Capable of encoding audio into discrete tokens for high - quality, low - bitrate audio coding.
- Foundation for Downstream Applications: Serves as a foundational tool for developing downstream applications in speech understanding and generation.
📚 Documentation
Model Description
- Developed by: Stability AI
- Model type: Transformer audio codec model
- Model details: The released model is a speech codec that compresses real - world speech data into a suitable format for generative modeling. It provides a basis for developing downstream applications in speech understanding and generation, like text - to - speech systems and conversational AI models. Check our arXiv page and Github repo for more details.
License
- Community License: Free for research, non - commercial, and commercial use by organizations and individuals with an annual revenue of less than US $1,000,000 (or local currency equivalent). If your annual revenue exceeds US $1M, any commercial use of this model or its derivative works requires obtaining an Enterprise License directly from Stability AI. You can submit a request for an Enterprise License at https://stability.ai/enterprise. Refer to Stability AI's Community License at https://stability.ai/license for more information.
Model Sources
- Repository: https://github.com/Stability-AI/stable-codec
- Audio demos: https://stability-ai.github.io/stable-codec-demo/
- arXiv page: https://arxiv.org/abs/2411.19842
Training Dataset
The model was trained on datasets from creative commons or public domain audiobook recordings. See academic paper for more details.
Intended Uses
- Efficient compression of speech signals for storage or streaming.
- Enhancing speech - based applications, such as telecommunication systems and real - time communication platforms.
- Research and development in audio coding and speech synthesis, including understanding and improving codec performance.
- Development of downstream applications including speech recognition and generation.
All uses of the model should comply with our Acceptable Use Policy.
Out-of-Scope Uses
This model is trained on non - overlapping clean English speech and performs best in such scenarios. It's not suitable for applications requiring high - fidelity music or environmental sound coding.
Contact
Report any issues with the model or contact us:
- Safety issues: safety@stability.ai
- Security issues: security@stability.ai
- Privacy issues: privacy@stability.ai
- License and general: https://stability.ai/license
- Enterprise license: https://stability.ai/enterprise
🚀 Quick Start
For usage instructions, please refer to our GitHub repository
📄 License
The model is under the stabilityai-ai-community license. By using this model, you agree to the License Agreement and acknowledge Stability AI's Privacy Policy.
⚠️ Important Note
For individuals or organizations generating annual revenue of US $1,000,000 (or local currency equivalent) or more, you must obtain an enterprise commercial license directly from Stability AI before commercial use.
Property |
Details |
Model Type |
Transformer audio codec model |
Training Data |
Datasets derived from creative commons or public domain audiobook recordings |