Kan-bayashi_csj_asr_train Model - Free and Open-Source for Precise Japanese Automatic Speech Recognition

Kan Bayashi Csj Asr Train Asr Transformer Raw Char Sp Valid.acc.ave

Developed by espnet

This is a Japanese automatic speech recognition (ASR) model trained using the ESPnet framework, utilizing the CSJ dataset and based on the Transformer architecture.

Speech Recognition Japanese#Japanese speech recognition #End-to-end model #Academic lecture transcription

Downloads 13

Release Time : 3/2/2022

Model Overview

This model is an end-to-end Japanese speech recognition model capable of converting Japanese speech into text. It was developed using the ESPnet toolkit and trained on the CSJ (Corpus of Spontaneous Japanese) dataset.

Model Features

End-to-end speech recognition

Uses end-to-end training to directly generate text output from speech input.

Transformer-based architecture

Employs the Transformer model architecture, which has strong sequence modeling capabilities.

Trained on professional Japanese dataset

Trained on the CSJ (Corpus of Spontaneous Japanese) dataset, achieving good recognition performance for Japanese speech.

Model Capabilities

Japanese speech recognition

Speech-to-text

Automatic transcription

Use Cases

Speech transcription

Automatic meeting transcription

Automatically converts Japanese meeting recordings into text transcripts.

Japanese voice input

Provides Japanese voice input functionality for applications.

Assistive tools

Hearing impairment assistance

Offers real-time speech-to-text services for individuals with hearing impairments.

🚀 Example ESPnet2 ASR model

This is an ESPnet2 ASR model that provides a solution for automatic speech recognition tasks, trained on the csj dataset.

🚀 Quick Start

This model was trained by kan-bayashi using the csj/asr1 recipe in espnet. It was imported from https://zenodo.org/record/4037458/.

💻 Usage Examples

Basic Usage

# coming soon

📚 Documentation

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}

or arXiv:

@misc{watanabe2018espnet,
      title={ESPnet: End-to-End Speech Processing Toolkit}, 
      author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
      year={2018},
      eprint={1804.00015},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This project is licensed under the cc-by-4.0 license.

Property	Details
Tags	espnet, audio, automatic-speech-recognition
Language	ja
Datasets	csj
License	cc-by-4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご