🚀 日语Hubert基础音素CTC模型
本模型是将 rinna/japanese-hubert-base 通过 CTC 进行日语音素识别的微调模型,可有效提升日语语音识别的准确性,为相关语音处理任务提供有力支持。
✨ 主要特性
- 数据集使用:使用 ReazonSpeech v2 数据集,以 pyopenjtalk-plus 生成的音素标签为正确答案,对 rinna/japanese-hubert-base 进行微调。
- 模型选择:在约 0.3 个 epoch 学习后,选择对 JSUT 语料库(标签:https://github.com/sarulab-speech/jsut-label)精度最佳的检查点。
📦 安装指南
文档未提供安装步骤,可参考transformers
库的官方安装指南进行安装。
💻 使用示例
基础用法
import librosa
import numpy as np
import torch
from transformers import HubertForCTC, Wav2Vec2Processor
MODEL_NAME = "prj-beatrice/japanese-hubert-base-phoneme-ctc"
model = HubertForCTC.from_pretrained(MODEL_NAME)
processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
audio, sr = librosa.load("audio.wav", sr=16000)
audio = np.concatenate([np.zeros(sr), audio, np.zeros(sr // 2)])
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predicted_ids = outputs.logits.argmax(-1)
phonemes = processor.decode(predicted_ids[0], spaces_between_special_tokens=True)
print(phonemes)
📚 详细文档
模型概要
- 采用 ReazonSpeech v2 数据集,把 pyopenjtalk-plus 生成的音素标签作为正确答案,对 rinna/japanese-hubert-base 进行微调。
- 学习约 0.3 个 epoch 后,挑选出对 JSUT 语料库(标签:https://github.com/sarulab-speech/jsut-label)精度最高的检查点。
超参数
- 学习率:
- CTC Head:2e - 5
- 其他:2e - 6
- 批量大小:32
- 最大语音样本数:250000
- 优化器:AdamW
- betas:(0.9, 0.98)
- 权重衰减:0.01
- 学习率调度:Cosine
- 预热步数:10000
- 最大步数:800000(不过,由于中途 JSUT 精度不再提高,在 200000 步时停止)
学习环境
- 硬件:A100 80GB
- 软件:Python 3.10.12
absl-py==2.3.0
accelerate==1.7.0
aiohappyeyeballs==2.6.1
aiohttp==3.12.13
aiosignal==1.3.2
annotated-types==0.7.0
async-timeout==5.0.1
attrs==25.3.0
audioread==3.0.1
certifi==2025.6.15
cffi==1.17.1
charset-normalizer==3.4.2
click==8.2.1
coloredlogs==15.0.1
coverage==7.9.1
datasets==3.6.0
decorator==5.2.1
dill==0.3.8
evaluate==0.4.3
exceptiongroup==1.3.0
filelock==3.18.0
flatbuffers==25.2.10
frozenlist==1.7.0
fsspec==2025.3.0
gitdb==4.0.12
gitpython==3.1.44
grpcio==1.73.0
hf-xet==1.1.3
huggingface-hub==0.33.0
humanfriendly==10.0
idna==3.10
iniconfig==2.1.0
jinja2==3.1.6
jiwer==3.1.0
joblib==1.5.1
lazy-loader==0.4
librosa==0.11.0
llvmlite==0.44.0
markdown==3.8
markupsafe==3.0.2
mpmath==1.3.0
msgpack==1.1.1
multidict==6.4.4
multiprocess==0.70.16
networkx==3.4.2
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
onnxruntime==1.22.0
packaging==25.0
pandas==2.3.0
platformdirs==4.3.8
pluggy==1.6.0
pooch==1.8.2
propcache==0.3.2
protobuf==6.31.1
psutil==7.0.0
pyarrow==20.0.0
pycparser==2.22
pydantic==2.11.7
pydantic-core==2.33.2
pygments==2.19.1
pyopenjtalk-plus==0.4.1.post3
pytest==8.4.0
pytest-cov==6.2.1
python-dateutil==2.9.0.post0
pytz==2025.2
pyyaml==6.0.2
rapidfuzz==3.13.0
regex==2024.11.6
requests==2.32.4
ruff==0.11.13
safetensors==0.5.3
scikit-learn==1.7.0
scipy==1.15.3
sentry-sdk==2.30.0
setproctitle==1.3.6
setuptools==80.9.0
six==1.17.0
smmap==5.0.2
soundfile==0.13.1
soxr==0.5.0.post1
sudachidict-core==20250515
sudachipy==0.6.10
sympy==1.14.0
tensorboard==2.19.0
tensorboard-data-server==0.7.2
threadpoolctl==3.6.0
tokenizers==0.21.1
tomli==2.2.1
torch==2.7.1
torchaudio==2.7.1
tqdm==4.67.1
transformers==4.52.4
triton==3.3.1
typing-extensions==4.14.0
typing-inspection==0.4.1
tzdata==2025.2
urllib3==2.4.0
wandb==0.20.1
werkzeug==3.1.3
xxhash==3.5.0
yarl==1.20.1
📄 许可证
本项目采用 Apache - 2.0 许可证。