VoiceRestore is a model specifically designed for repairing damaged voice recordings, capable of handling background noise, reverberation, distortion, and signal loss
Model Features
Comprehensive Restoration
Capable of handling any degree and type of voice recording damage
Easy to Use
Provides a simple interface for processing damaged audio
Pre-trained Model
Includes a pre-trained transformer model with 301 million parameters
Model Capabilities
Noise Reduction
Reverberation Elimination
Distortion Repair
Signal Loss Recovery
Use Cases
Voice Restoration
Damaged Recording Repair
Repairs voice recordings with background noise, distortion, and other issues
Significantly improves speech clarity and intelligibility
Historical Recording Restoration
Processes low-quality audio recorded by old recording devices
Restores original voice characteristics
đ VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
VoiceRestore is a state-of-the-art speech restoration model. It can greatly improve the quality of degraded voice recordings. By using flow-matching transformers, this model can effectively deal with various audio imperfections in speech, such as background noise, reverberation, distortion, and signal loss.
It is based on this repo & demo of audio restorations: VoiceRestore
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
#add short=False if audio is > 10 seconds
model("long.mp3", "long_output.mp3", short=False)
đģ Usage Examples
Basic Usage
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
Advanced Usage
# If the audio is longer than 10 seconds, add the short=False parameter
model("long.mp3", "long_output.mp3", short=False)
đ Example
Degraded Input:
Degraded Input Audio
Restored (steps=32, cfg=1.0):
Restored audio - 16 steps, strength 0.5:
⨠Features
Universal Restoration: The model can handle any level and type of voice recording degradation. It's truly amazing.
Easy to Use: It has a simple interface for processing degraded audio files.
Pretrained Model: It includes a 301 million parameter transformer model with pre-trained weights. (The model is still under training, and there will be further checkpoint updates)
đ Documentation
Model Details
Property
Details
Model Type
Flow-matching transformer
Parameters
300M+ parameters
Input
Degraded speech audio (various formats supported)
Output
Restored speech
Limitations and Future Work
The current model is optimized for speech and may not perform optimally on music or other audio types.
Ongoing research is being conducted to improve performance on extreme degradations.
Future updates may include real-time processing capabilities.
Citation
If you use VoiceRestore in your research, please cite our paper: