🚀 Sugoi v4 JPN->ENG NMT Model by MingShiba
This is a high - performance Japanese - to - English neural machine translation model developed by MingShiba, which can provide efficient and accurate translation services.
🔗 Related Links
🚀 Quick Start
📦 Installation
python --version
- Install the
huggingface_hub
library:
python -m pip install huggingface_hub
- Enter the Python interactive environment:
python
💻 Usage Examples
Basic Usage
Download the model using Python:
import huggingface_hub
huggingface_hub.download_snapshot('entai2965/sugoi-v4-ja-en-ctranslate2', local_dir='sugoi-v4-ja-en-ctranslate2')
Advanced Usage
Run the model in batch syntax:
First, refer to the CTranslate2 Fairseq guide.
- Open the command prompt (
cmd
).
- Install the required libraries:
python -m pip install ctranslate2 sentencepiece
- Enter the Python interactive environment:
import ctranslate2
import sentencepiece
model_path='sugoi-v4-ja-en-ctranslate2'
sentencepiece_model_path=model_path+'/spm'
device='cpu'
string1='は静かに前へと歩み出た。'
string2='悲しいGPTと話したことがありますか?'
raw_list=[string1,string2]
translator = ctranslate2.Translator(model_path, device=device)
tokenizer_for_source_language = sentencepiece.SentencePieceProcessor(sentencepiece_model_path+'/spm.ja.nopretok.model')
tokenizer_for_target_language = sentencepiece.SentencePieceProcessor(sentencepiece_model_path+'/spm.en.nopretok.model')
tokenized_batch=[]
for text in raw_list:
tokenized_batch.append(tokenizer_for_source_language.encode(text,out_type=str))
translated_batch=translator.translate_batch(source=tokenized_batch,beam_size=5)
assert(len(raw_list)==len(translated_batch))
for count,tokens in enumerate(translated_batch):
translated_batch[count]=tokenizer_for_target_language.decode(tokens.hypotheses[0]).replace('<unk>','')
for text in translated_batch:
print(text)
Functional Programming Version
import ctranslate2
import sentencepiece
model_path='sugoi-v4-ja-en-ctranslate2'
sentencepiece_model_path=model_path+'/spm'
device='cpu'
string1='は静かに前へと歩み出た。'
string2='悲しいGPTと話したことがありますか?'
raw_list=[string1,string2]
translator = ctranslate2.Translator(model_path, device=device)
tokenizer_for_source_language = sentencepiece.SentencePieceProcessor(sentencepiece_model_path+'/spm.ja.nopretok.model')
tokenizer_for_target_language = sentencepiece.SentencePieceProcessor(sentencepiece_model_path+'/spm.en.nopretok.model')
translated_batch=[tokenizer_for_target_language.decode(tokens.hypotheses[0]).replace('<unk>','') for tokens in translator.translate_batch(source=[tokenizer_for_source_language.encode(text,out_type=str) for text in raw_list],beam_size=5)]
assert(len(raw_list)==len(translated_batch))
for text in translated_batch:
print(text)
📄 License
- License Type: Other
- License Name: ntt - license
- License Link: LICENSE
📋 Model Information
Property |
Details |
Pipeline Tag |
Translation |
Library Name |
fairseq |
Tags |
nmt |
Supported Languages |
Japanese, English |