Experimental Japanese speech synthesis model, utilizing Parler-TTS prompt architecture and XCodec2 audio decoder, allows pitch and background noise adjustment through control prompts
Model Features
Prompt Control
Fine-tune audio quality by modifying control prompts and reading prompts
Lightweight Design
150M parameter scale suitable for deployment in resource-constrained environments
High-Quality Audio Output
Uses XCodec2 audio decoder to ensure speech quality
Model Capabilities
Japanese Speech Synthesis
Pitch Adjustment
Background Noise Control
Text-to-Speech
Use Cases
Voice Interaction
Virtual Assistant
Provides natural speech output for Japanese virtual assistants
Generates speech with emotional characteristics
Content Creation
Audio Content Generation
Automatically converts Japanese text to speech
Supports speech output with different tones and intonations
🚀 Canary-TTS-150M
Canary-TTS-150M is a Text-to-Speech (TTS) model trained based on llm-jp/llm-jp-3-150m-instruct3. It adopts the same prompt method as Parler-TTS, allowing for fine-grained control of voice quality by changing control prompts and reading prompts. This model is an experimental model created for training Canary-TTS 0.5B, so the use of Canary-TTS 0.5B is recommended.
The creator makes no guarantees regarding the accuracy, legality, or appropriateness of the results obtained from using this model.
When using this model, users must comply with all applicable laws and regulations. All responsibilities arising from the generated content shall be borne by the user.
The creator of this repository and the model shall not be held liable for any copyright infringement or other legal issues.
In the event of a copyright issue, the problematic resources or data will be promptly deleted.