Reazonspeech Nemo V2
Japanese automatic speech recognition model trained on the ReazonSpeech v2.0 corpus, supporting long audio inference
Downloads 3,897
Release Time : 1/30/2024
Model Overview
This model is an automatic speech recognition system optimized for Japanese, capable of processing continuous speech inputs lasting several hours.
Model Features
Long audio processing capability
Supports continuous recognition of Japanese long audio segments lasting several hours
Efficient attention mechanism
Utilizes Longformer attention mechanism with local context size of 256, including global tokens
Optimized training
Trained for 1 million steps using AdamW optimizer and Noam annealing schedule
Model Capabilities
Japanese speech recognition
Long audio processing
Continuous speech transcription
Use Cases
Speech transcription
Automatic meeting minutes generation
Automatically converts long business meeting recordings into text transcripts
Media content subtitle generation
Automatically generates subtitles for Japanese podcasts, videos, and other content
Featured Recommended AI Models