Visual-novel-transcriptor: An open-source Japanese audio transcription model optimized specifically for visual novel scenarios!

Visual Novel Transcriptor

Developed by spow12

A Japanese speech recognition model fine-tuned based on distil-whisper/distil-large-v2, specifically designed for Japanese audio transcription with optimizations for visual novel scenarios

Speech Recognition

Transformers

Supports Multiple Languages#Japanese audio transcription #Visual novel optimization #Anime content recognition

Downloads 31

Release Time : 4/15/2024

Model Overview

This is an automatic speech recognition (ASR) model primarily used to convert Japanese speech into text, especially suitable for processing dialogue content in visual novels

Model Features

Visual Novel Scenario Optimization

Specially optimized for dialogue content in visual novels, capable of better handling such audio

Japanese Recognition Capability

Focused on Japanese speech recognition, performing better in Japanese environments

Lightweight Model

Based on the lightweight version of distil-whisper, reducing computational resource requirements while maintaining performance

Model Capabilities

Japanese speech-to-text

English speech-to-text

Visual novel dialogue recognition

Use Cases

Anime-related applications

Visual novel transcription

Convert Japanese dialogues in visual novels into text

Generate editable dialogue text

Anime speech recognition

Recognize Japanese dialogue content in anime

Generate subtitles or scripts

🚀 Model Card for Model ID

This is a fine-tuned Automatic Speech Recognition (ASR) model designed to transcribe Japanese audio, especially from visual novels. It is based on the distil-whisper/distil-large-v2 model.

🚀 Quick Start

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Property	Details
Developed by	spow12(yw_nam)
Shared by	spow12(yw_nam)
Model Type	Seq2Seq
Language(s) (NLP)	Japanese
Finetuned from model	distil-whisper/distil-large-v2

WaifuModel Collections

Unified Demo

WaifuAssitant

💻 Usage Examples

Basic Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa

processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")

data, _ = librosa.load(wav_path, sr=16000)
input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

🔧 Technical Details

Bias, Risks, and Limitations

This model was trained on a Japanese dataset that includes visual novels, which may contain NSFW content.

Use & Credit

This model is currently available for non-commercial use only. Also, since the developer is not well-versed in licensing, users are expected to use it responsibly. The developer hopes to contribute to the research efforts of the community (the open-source community and anime enthusiasts) by sharing this model.

Citation

@misc {Visual-novel-transcriptor,
    author       = { YoungWoo Nam },
    title        = { Visual-novel-transcriptor },
    year         = 2024,
    url          = { https://huggingface.co/spow12/Visual-novel-transcriptor },
    publisher    = { Hugging Face }
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご