The open-source model of aihub-ja-ko-translator - Achieve accurate Japanese-to-Korean translation for free

Aihub Ja Ko Translator

Developed by sappho192

A Japanese to Korean translation model based on the EncoderDecoder architecture, using the bert-japanese encoder and the kogpt2 decoder

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Japanese and Korean Translation #BERT-KoGPT2 Hybrid Model #AI-Hub Dataset

Downloads 478

Release Time : 2/5/2024

Model Overview

This is a Japanese to Korean translator based on the EncoderDecoderModel, using bert-japanese as the encoder and kogpt2 as the decoder, which can achieve accurate translation from Japanese to Korean.

Model Features

Hybrid Architecture

Combines the advantages of the bert-japanese encoder and the kogpt2 decoder to achieve high-quality translation

Accurate Translation

Can achieve accurate translation from Japanese to Korean, especially suitable for daily conversations and technical texts

Multi-domain Support

The training data covers multiple domains such as daily life, technical science, basic science, and humanities

Model Capabilities

Japanese Text Translation

Korean Text Generation

Cross-lingual Conversion

Use Cases

Daily Communication

Daily Conversation Translation

Translate Japanese daily conversations into Korean

Accurately translate greetings and common expressions

Professional Field

Technical Document Translation

Translate Japanese technical documents into Korean

Maintain the accuracy of professional terms

🚀 Japanese to Korean Translator

This is a Japanese to Korean translator model. It's built upon the EncoderDecoderModel, combining [bert - japanese](https://huggingface.co/cl - tohoku/bert - base - japanese) and [kogpt2](https://github.com/SKT - AI/KoGPT2). This model effectively addresses the need for translating Japanese text into Korean, offering a practical solution for language - related tasks.

🚀 Quick Start

Demo

You can visit the demo at https://huggingface.co/spaces/sappho192/aihub - ja - ko - translator - demo.

✨ Features

Model Architecture: Utilizes the EncoderDecoderModel with bert - japanese as the encoder and kogpt2 as the decoder.
Language Pair: Specialized in Japanese to Korean translation.

📦 Installation

Dependencies (PyPI)

torch
transformers
fugashi
unidic - lite

💻 Usage Examples

Basic Usage

from transformers import(
    EncoderDecoderModel,
    PreTrainedTokenizerFast,
    BertJapaneseTokenizer,
)

import torch

encoder_model_name = "cl - tohoku/bert - base - japanese - v2"
decoder_model_name = "skt/kogpt2 - base - v2"

src_tokenizer = BertJapaneseTokenizer.from_pretrained(encoder_model_name)
trg_tokenizer = PreTrainedTokenizerFast.from_pretrained(decoder_model_name)

model = EncoderDecoderModel.from_pretrained("sappho192/aihub - ja - ko - translator")

text = "初めまして。よろしくお願いします。"

def translate(text_src):
    embeddings = src_tokenizer(text_src, return_attention_mask=False, return_token_type_ids=False, return_tensors='pt')
    embeddings = {k: v for k, v in embeddings.items()}
    output = model.generate(**embeddings, max_length=500)[0, 1:-1]
    text_trg = trg_tokenizer.decode(output.cpu())
    return text_trg

print(translate(text))

📚 Documentation

Dataset

This model uses datasets from 'The Open AI Dataset Project (AI - Hub, South Korea)'. All data information can be accessed through 'AI - Hub (aihub.or.kr)'.

⚠️ Important Note In order for a corporation, organization, or individual located outside of Korea to use AI data, etc., a separate agreement is required with the performing organization and the Korea National Information Society agency(NIA). In order to export AI data, etc. outside the country, a separate agreement is required with the performing organization and the NIA. Link

Dataset list

The dataset used to train the model is merged from the following sub - datasets:

1. Everyday life and colloquial Korean - Chinese, Korean - Japanese translation parallel corpus data [Link]
1. Korean - multilingual (excluding English) translation corpus (science and technology) [Link]
1. Korean - multilingual translation corpus (basic science) [Link]
1. Korean - multilingual translation corpus (humanities) [Link]
Korean - Japanese translation corpus [Link]

To reproduce the merged dataset, you can use the code at the following link: https://github.com/sappho192/aihub - translation - dataset

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご