FFXIV-JA-KO Translator: An open-source translation model for free Japanese-to-Korean translation in "Final Fantasy XIV"

Ffxiv Ja Ko Translator

Developed by sappho192

A translation model specifically designed for converting Japanese text in the game 'Final Fantasy XIV' into Korean

Supports Multiple LanguagesOpen Source License:MIT #Game-specific translation #Japanese-Korean bilingual conversion #BERT-GPT architecture

Downloads 40

Release Time : 4/3/2023

Model Overview

This model is based on the Transformer architecture and is optimized for Japanese-to-Korean translation tasks in 'Final Fantasy XIV', accurately translating game-specific terms and common phrases.

Model Features

Game-specific translation

Optimized specifically for 'Final Fantasy XIV' game content, accurately translating game-specific terms and terminology

Transformer-based architecture

Uses BERT-japanese as the encoder and KoGPT2 as the decoder to achieve high-quality translation results

Multi-platform support

Provides models in both PyTorch and ONNX formats, supporting different deployment environments

Model Capabilities

Japanese to Korean text translation

Accurate translation of game terminology

Short sentence and paragraph translation

Use Cases

Game localization

In-game text translation

Translating Japanese interface and quest text in 'Final Fantasy XIV' into Korean

Accurately translates game-specific terms like 'ギルガメッシュ討伐戦' to '길가메쉬 토벌전'

Player communication assistance

Facilitating communication between Japanese and Korean players

Can translate common game phrases like '一緒に行きましょうか？' to '같이 가실래요?'

🚀 Japanese to Korean translator for FFXIV

This project provides a Japanese to Korean translator specifically designed for FFXIV. It utilizes transformer models and offers both PyTorch and ONNX-based inference methods.

🚀 Quick Start

This project is detailed on the Github repo.

✨ Features

Translation Pipeline: Specialized for translating Japanese text to Korean in the context of FFXIV.
Multiple Inference Methods: Supports both PyTorch and Optimum.OnnxRuntime for inference.
Training Notebook: A training notebook is provided for further model customization.

📦 Installation

The README does not provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

Inference (PyTorch)

from transformers import(
    EncoderDecoderModel,
    PreTrainedTokenizerFast,
    BertJapaneseTokenizer,
)

import torch

encoder_model_name = "cl-tohoku/bert-base-japanese-v2"
decoder_model_name = "skt/kogpt2-base-v2"

src_tokenizer = BertJapaneseTokenizer.from_pretrained(encoder_model_name)
trg_tokenizer = PreTrainedTokenizerFast.from_pretrained(decoder_model_name)

# You should change following `./best_model` to the path of model **directory**
model = EncoderDecoderModel.from_pretrained("./best_model")

text = "ギルガメッシュ討伐戦"
# text = "ギルガメッシュ討伐戦に行ってきます。一緒に行きましょうか？"

def translate(text_src):
    embeddings = src_tokenizer(text_src, return_attention_mask=False, return_token_type_ids=False, return_tensors='pt')
    embeddings = {k: v for k, v in embeddings.items()}
    output = model.generate(**embeddings, max_length=500)[0, 1:-1]
    text_trg = trg_tokenizer.decode(output.cpu())
    return text_trg

print(translate(text))

Inference (Optimum.OnnxRuntime)

Note that current Optimum.OnnxRuntime still requires PyTorch for backend. [Issue] You can use either [ONNX] or [quantized ONNX] model.

from transformers import BertJapaneseTokenizer,PreTrainedTokenizerFast
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from onnxruntime import SessionOptions
import torch

encoder_model_name = "cl-tohoku/bert-base-japanese-v2"
decoder_model_name = "skt/kogpt2-base-v2"

src_tokenizer = BertJapaneseTokenizer.from_pretrained(encoder_model_name)
trg_tokenizer = PreTrainedTokenizerFast.from_pretrained(decoder_model_name)

sess_options = SessionOptions()
sess_options.log_severity_level = 3 # mute warnings including CleanUnusedInitializersAndNodeArgs
# change subfolder to "onnxq" if you want to use the quantized model
model = ORTModelForSeq2SeqLM.from_pretrained("sappho192/ffxiv-ja-ko-translator",
        sess_options=sess_options, subfolder="onnx") 

texts = [
    "逃げろ!",  # Should be "도망쳐!"
    "初めまして.",  # "반가워요"
    "よろしくお願いします.",  # "잘 부탁드립니다."
    "ギルガメッシュ討伐戦",  # "길가메쉬 토벌전"
    "ギルガメッシュ討伐戦に行ってきます。一緒に行きましょうか？",  # "길가메쉬 토벌전에 갑니다. 같이 가실래요?"
    "夜になりました",  # "밤이 되었습니다"
    "ご飯を食べましょう."  # "음, 이제 식사도 해볼까요"
 ]


def translate(text_src):
    embeddings = src_tokenizer(text_src, return_attention_mask=False, return_token_type_ids=False, return_tensors='pt')
    print(f'Src tokens: {embeddings.data["input_ids"]}')
    embeddings = {k: v for k, v in embeddings.items()}

    output = model.generate(**embeddings, max_length=500)[0, 1:-1]
    print(f'Trg tokens: {output}')
    text_trg = trg_tokenizer.decode(output.cpu())
    return text_trg


for text in texts:
    print(translate(text))
    print()

Advanced Usage

Training

Check the training.ipynb.

📚 Documentation

Demo

Click to try demo Check this Windows app demo with ONNX model

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	Transformer-based encoder-decoder model
Training Data	Helsinki-NLP/tatoeba_mt, sappho192/Tatoeba-Challenge-jpn-kor
Languages	Japanese, Korean
Pipeline Tag	Translation
Tags	python, transformer, pytorch
Inference	false

⚠️ Important Note

FINAL FANTASY is a registered trademark of Square Enix Holdings Co., Ltd.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご