The open-source voice cloning model voice - clone - large - finetune - final, precisely designed for voice recognition tasks!

Voice Clone Large Finetune Final

Developed by neuronbit

This model is a voice cloning model fine-tuned based on openai/whisper-large-v3, primarily used for speech recognition tasks, achieving a word error rate of 15.3572 on the evaluation set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Voice Cloning #Low Word Error Rate #Whisper Fine-tuning

Downloads 37

Release Time : 11/27/2024

Model Overview

A speech recognition model fine-tuned on Whisper-large-v3, focusing on improving speech recognition accuracy in specific scenarios.

Model Features

Low Word Error Rate

Achieves a word error rate of 15.3572 on the evaluation set, outperforming many general-purpose speech recognition models.

Fine-tuning

Deeply fine-tuned based on Whisper-large-v3 to adapt to specific speech recognition scenarios.

Efficient Training

Utilizes mixed-precision training and gradient accumulation techniques to optimize training efficiency.

Model Capabilities

Speech Recognition

Speech-to-Text

Audio Content Analysis

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts.

Word error rate 15.3572

Voice Notes

Convert voice memos into searchable text.

Speech Analysis

Speech Content Analysis

Analyze audio content and extract key information.

Training Loss	Epoch	Step	Validation Loss	Wer
0.1607	0.8460	250	0.5163	25.9413
0.0598	1.6920	500	0.4849	24.8444
0.0257	2.5381	750	0.4450	30.4180
0.0141	3.3841	1000	0.4369	19.3003
0.0029	4.2301	1250	0.4267	16.0095
0.0015	5.0761	1500	0.4209	18.4109
0.0063	5.9222	1750	0.4259	19.3300
0.0016	6.7682	2000	0.4341	17.7587
0.0009	7.6142	2250	0.4121	17.0471
0.0013	8.4602	2500	0.4199	16.3653
0.0009	9.3063	2750	0.4233	16.5135
0.001	10.1523	3000	0.4237	16.0688
0.0019	10.9983	3250	0.4230	16.4542
0.0014	11.8443	3500	0.4292	15.8316
0.0007	12.6904	3750	0.4291	15.8316
0.0005	13.5364	4000	0.4321	15.3869
0.0009	14.3824	4250	0.4334	15.2980
0.001	15.2284	4500	0.4344	15.2980
0.0	16.0745	4750	0.4372	15.3572
0.0	16.9205	5000	0.4377	15.3572

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Voice Clone Large Finetune Final

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 voice-clone-large-finetune-final

🚀 Quick Start

✨ Features

📚 Documentation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

📄 License