Llama3-instrucTrans-enko-8b-GGUF Open-source English-Korean Translation Model - Precise Translation of English Instruction Datasets

Llama3 Instructrans Enko 8b GGUF

Developed by afrideva

An English-Korean translation model trained on Llama-3-8B-it, specifically designed for translating English instruction datasets.

Machine Translation Supports Multiple Languages#English-Korean translation #Instruction translation #Multilingual support

Downloads 111

Release Time : 5/13/2024

Model Overview

This model specializes in bidirectional English-Korean translation tasks, particularly suited for handling instruction-based text translation.

Model Features

High-quality English-Korean translation

Focuses on providing high-quality bidirectional English-Korean translation, with special optimization for instruction-based text translation.

Based on Llama3 architecture

Utilizes the powerful Llama3-8B-it model as its foundation, excelling in language understanding and generation capabilities.

Large-scale training data

Trained on over 1.5 million English-Korean parallel corpora, including AI Hub and translate_corpus datasets.

Model Capabilities

English to Korean translation

Korean to English translation

Instruction text translation

Bilingual text generation

Use Cases

Language services

Technical document translation

Accurately translate English technical documents into Korean

Maintains accuracy of technical terms and contextual consistency

Application interface localization

Translate application interfaces and user instructions

Provides natural and fluent localized content

Education

Learning material translation

Translate English learning materials into Korean to aid learning

Preserves the accuracy of original educational content and structure

🚀 llama3-instrucTrans-enko-8b-GGUF

Quantized GGUF model files for seamless English-Korean translation

This repository provides quantized GGUF model files for llama3-instrucTrans-enko-8b created by nayohan. These files are optimized for efficient inference, enabling high-quality English to Korean translation.

✨ Features

Multilingual Support: Supports both English (en) and Korean (ko), facilitating seamless cross - language communication.
Translation Capability: Specifically designed for English to Korean translation tasks, with high - quality generation results.
Quantized for Efficiency: The GGUF format ensures efficient inference, making it suitable for various applications.

📦 Installation

The model can be loaded using the transformers library. Ensure you have the necessary dependencies installed:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nayohan/llama3-instrucTrans-enko-8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
  model_name,
  device_map="auto",
  torch_dtype=torch.bfloat16
)

💻 Usage Examples

Basic Usage

To translate text from English to Korean, use the following Python code:

system_prompt="당신은 번역기 입니다. 영어를 한국어로 번역하세요."
sentence = "The aerospace industry is a flower in the field of technology and science."
conversation = [{'role': 'system', 'content': system_prompt},
                {'role': 'user', 'content': sentence}]

inputs = tokenizer.apply_chat_template(
  conversation,
  tokenize=True,
  add_generation_prompt=True,
  return_tensors='pt'
).to("cuda")

outputs = model.generate(inputs, max_new_tokens=4096) # Finetuned with length 4096
print(tokenizer.decode(outputs[0][len(inputs[0]):]))

Example Results

# Result
INPUT: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n당신은 번역기 입니다. 영어를 한국어로 번역하세요.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nThe aerospace industry is a flower in the field of technology and science.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
OUTPUT: 항공우주 산업은 기술과 과학 분야의 꽃입니다.<|eot_id|>

INPUT: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n당신은 번역기 입니다. 영어를 한국어로 번역하세요.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n
Technical and basic sciences are very important in terms of research. It has a significant impact on the industrial development of a country. Government policies control the research budget.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
OUTPUT: 기술 및 기초 과학은 연구 측면에서 매우 중요합니다. 이는 한 국가의 산업 발전에 큰 영향을 미칩니다. 정부 정책은 연구 예산을 통제합니다.<|eot_id|>

📚 Documentation

Model Details

Property	Details
Model Type	llama3-instrucTrans-enko-8b
Model Creator	nayohan
Library Name	transformers
License	llama3
Pipeline Tag	text-generation
Quantized By	afrideva
Tags	translation, enko, ko, gguf, ggml, quantized
Training Datasets	nayohan/aihub-en-ko-translation-1.2m, nayohan/translate_corpus_313k
Metrics	sacrebleu

Evaluation

The model's performance was evaluated on multiple datasets:

Aihub/FLoRes: traintogpb/aihub-flores-koen-integrated-sparta-30k (test set 1k)
iwslt - 2023: shreevigneshs/iwslt-2023-en-ko-train-val-split-0.1 (f_test 597, if_test 597)
ko_news_2024: nayohan/ko_news_eval40 (40)

Aihub English - Korean Translation Dataset Evaluation

model	aihub-111	aihub-124	aihub-125	aihub-126	aihub-563	aihub-71265	aihub-71266	aihub-71382	average
EEVE-10.8b-it	6.15	11.81	5.78	4.99	6.31	10.99	9.41	6.44	7.73
KULLM3	9.00	13.49	10.43	5.90	1.92	16.37	10.02	8.39	9.44
Seagull-13B	9.8	18.38	8.51	5.53	8.74	17.44	10.11	11.21	11.21
Synatra-7B	6.99	25.14	7.79	5.31	9.95	19.27	13.20	8.93	12.07
nhndq-nllb	24.09	48.71	22.89	13.98	18.71	30.18	32.49	18.62	26.20
our-tech	20.19	37.48	18.50	12.45	16.96	13.92	43.54	9.62	21.58
our-general	24.72	45.22	21.61	18.97	17.23	30.00	32.08	13.55	25.42
our-sharegpt	12.42	19.23	10.91	9.18	14.30	26.43	12.62	15.57	15.08
our-instrucTrans	24.89	47.00	22.78	21.78	24.27	27.98	31.31	15.42	26.92

FLoRes English - Korean Translation Dataset Evaluation

model	flores-dev	flores-devtest	average
EEVE-10.8b-it	10.99	11.71	11.35
KULLM3	12.83	13.23	13.03
Seagull-13B	11.48	11.99	11.73
Synatra-7B	10.98	10.81	10.89
nhndq-nllb	12.79	15.15	13.97
our-tech	12.14	12.04	12.09
our-general	14.93	14.58	14.75
our-sharegpt	14.71	16.69	15.70
our-instrucTrans	14.49	17.69	16.09

iwslt - 2023 Evaluation

model	iwslt_zondae	iwslt_banmal	average
EEVE-10.8b-it	...	...	...
...	...	...	...

📄 License

This model is released under the llama3 license. Please review the license terms before use.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご