Qwen2-7B-Instruct-Better-Translation Open-Source Language Model - Optimize English-to-Chinese Translation, Output Smooth and Idiomatic Translations

Qwen2 7B Instruct Better Translation

Developed by sevenone

A language model fine-tuned based on Qwen2-7B-Instruct, specifically optimized for English-to-Chinese translation tasks, prioritizing fluent and authentic translations over literal translations.

Machine Translation

Safetensors

EnglishOpen Source License:Apache-2.0 #Authentic English-to-Chinese Translation #DPO Optimized Translation #Natural Language Processing

Downloads 19

Release Time : 9/17/2024

Model Overview

This model is fine-tuned using Direct Preference Optimization (DPO) and is particularly suitable for users who need accurate and fluent translations of complex or nuanced English texts.

Model Features

Authentic Translation Priority

Uses Direct Preference Optimization to prioritize fluent and authentic translations over literal translations.

Large Context Support

Inherits Qwen2-7B-Instruct's ability to handle long contexts of up to 131,072 tokens.

Specialized Fine-Tuning

Fine-tuned using a custom English-to-Chinese preference dataset to optimize translation quality.

Model Capabilities

English-to-Chinese Translation

Natural Language Generation

Long Text Processing

Use Cases

Translation Services

Professional Document Translation

Translating technical documents, academic papers, and other professional content.

Produces professional translations that conform to Chinese expression habits.

Literary Translation

Translating literary works or texts with nuanced meanings.

Preserves the original artistic conception while conforming to Chinese expression habits.

🚀 Qwen2-7B-Instruct-Better-Translation

A fine - tuned language model based on Qwen2 - 7B - Instruct, optimized for high - quality English - to - Chinese translation.

🚀 Quick Start

To use this model, please ensure you have installed transformers>=4.37.0 to avoid any compatibility issues.

You can load and use the model to translate English to Chinese as shown in the following code snippet:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "sevenone/Qwen2-7B-Instruct-Better-Translation"
device = "cuda"  # load onto GPU if available

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Translate the following sentence to Chinese: 'Artificial intelligence is transforming industries worldwide.'"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

# Apply the chat template for better generation
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

# Generate translation
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

✨ Features

Qwen2 - 7B - Instruct - Better - Translation is designed to provide high - quality English - to - Chinese translations, particularly focusing on producing natural, idiomatic translations instead of literal, word - for - word translations. The fine - tuning process involved using a preference dataset where the chosen translations were idiomatic and the rejected translations were more literal. This model is ideal for users who need accurate and fluent translations for complex or nuanced English text.

📚 Documentation

Model Summary

Qwen2 - 7B - Instruct - Better - Translation is a fine - tuned language model based on Qwen2 - 7B - Instruct, specifically optimized for improving English - to - Chinese translation. The model was fine - tuned using Direct Preference Optimization (DPO) with a custom dataset that prioritizes fluent, idiomatic translations (chosen) over literal translations (rejected).

Developers: sevenone

Property	Details
License	Qwen2 License
Base Model	Qwen2 - 7B - Instruct
Model Size	7B
Context Length	131,072 tokens (inherits from Qwen2 - 7B - Instruct)

For more details, please refer to our GitHub.

Training Details

The model was fine - tuned using Direct Preference Optimization (DPO), a method that optimizes the model to prefer certain outputs over others based on user - provided preferences. The training dataset consisted of English source sentences, with corresponding translations labeled as either "chosen" (idiomatic) or "rejected" (literal).

Training Framework: Hugging Face Transformers
Optimizer: AdamW
Training Method: Lora with direct preference optimization
Training Data: Custom preference dataset for English - to - Chinese translation
Preference Type: Favoring idiomatic translations (chosen) over literal translations (rejected)

Citation

If sevenone/qwen2 - 7b - instruct - better - translation is helpful in your work, please kindly cite as:

@misc{sevenone_2024,
    author       = {sevenone},
    title        = {Qwen2-7B-Instruct-Better-Translation},
    year         = 2024,
    url          = {https://huggingface.co/sevenone/Qwen2-7B-Instruct-Better-Translation},
    publisher    = {Hugging Face}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご