ALMA-13B Open-source Translation Model - Free Deployment to Facilitate Precise and Efficient Language Translation Work

ALMA 13B

Developed by haoranxu

ALMA is an advanced translator based on large language models, employing a two-stage training paradigm (monolingual fine-tuning + parallel corpus optimization). The 13B-LoRA version achieves optimal performance through LoRA fine-tuning on the LLaMA-2-13B foundation.

Machine Translation

Transformers

Open Source License:MIT #Large Language Model Translation #Two-stage Fine-tuning #Contrastive Preference Optimization

Downloads 855

Release Time : 9/17/2023

Model Overview

A machine translation model based on the LLaMA-2-13B architecture, achieving high-quality translation through an innovative two-stage fine-tuning approach (monolingual pre-training + human parallel corpus fine-tuning).

Model Features

Two-stage Training Paradigm

First fine-tuned with large-scale monolingual data, then optimized with high-quality human parallel corpus, significantly improving translation quality.

LoRA Fine-tuning Technique

The 13B version employs the parameter-efficient LoRA fine-tuning method, reducing resource requirements while maintaining performance.

Contrastive Preference Optimization (ALMA-R)

The new version uses the CPO algorithm for preference learning, outperforming GPT-4 and WMT champion models.

Model Capabilities

Text Translation

Cross-language Conversion

High-quality Human-level Translation

Use Cases

Professional Translation

Technical Document Translation

Accurately convert technical documents between different languages

Achieves professional human translation level

Literary Content Translation

Handle translations of literary texts

Preserves original style and semantic accuracy

Localization Services

Product Localization

Provide multilingual support for globalized products

🚀 ALMA (Advanced Language Model-based translator)

ALMA is an LLM-based translation model that introduces a novel translation model paradigm. It first undergoes fine - tuning on monolingual data and is then further optimized using high - quality parallel data. This two - step fine - tuning process guarantees strong translation performance. For more in - depth information, please refer to our paper.

@misc{xu2023paradigm,
      title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models}, 
      author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},
      year={2023},
      eprint={2309.11674},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

ALMA - R (NEW!) is now released! ALMA - R is built upon ALMA models. Instead of the Supervised Fine - tuning used in ALMA, it undergoes further LoRA fine - tuning with our proposed Contrastive Preference Optimization (CPO). CPO fine - tuning requires our [triplet preference data](https://huggingface.co/datasets/haoranxu/ALMA - R - Preference) for preference learning. ALMA - R can now match or even outperform GPT - 4 or WMT winners!

@misc{xu2024contrastive,
      title={Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation}, 
      author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
      year={2024},
      eprint={2401.08417},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

✨ Features

Model Release

We have released six translation models as presented in the paper:

ALMA - 7B: Full - weight Fine - tune LLaMA - 2 - 7B on 20B monolingual tokens and then Full - weight fine - tune on human - written parallel data.
ALMA - 7B - LoRA: Full - weight Fine - tune LLaMA - 2 - 7B on 20B monolingual tokens and then LoRA fine - tune on human - written parallel data.
ALMA - 7B - R (NEW!): Further LoRA fine - tuning upon ALMA - 7B - LoRA with contrastive preference optimization.
ALMA - 13B: Full - weight Fine - tune LLaMA - 2 - 7B on 12B monolingual tokens and then Full - weight fine - tune on human - written parallel data.
ALMA - 13B - LoRA (Our best system): Full - weight Fine - tune LLaMA - 2 - 7B on 12B monolingual tokens and then LoRA fine - tune on human - written parallel data.
ALMA - 13B - R (NEW!): Further LoRA fine - tuning upon ALMA - 13B - LoRA with contrastive preference optimization.

Model Checkpoints

Model checkpoints are available on Hugging Face:

Property	Details
Model Type	ALMA - 7B, ALMA - 7B - LoRA, ALMA - 7B - R, ALMA - 13B, ALMA - 13B - LoRA, ALMA - 13B - R
Base Model Link	[haoranxu/ALMA - 7B](https://huggingface.co/haoranxu/ALMA - 7B), [haoranxu/ALMA - 7B - Pretrain](https://huggingface.co/haoranxu/ALMA - 7B - Pretrain), [haoranxu/ALMA - 7B - R (LoRA merged)](https://huggingface.co/haoranxu/ALMA - 7B - R), [haoranxu/ALMA - 13B](https://huggingface.co/haoranxu/ALMA - 13B), [haoranxu/ALMA - 13B - Pretrain](https://huggingface.co/haoranxu/ALMA - 13B - Pretrain), [haoranxu/ALMA - 13B - R (LoRA merged)](https://huggingface.co/haoranxu/ALMA - 13B - R)
LoRA Link	-, [haoranxu/ALMA - 7B - Pretrain - LoRA](https://huggingface.co/haoranxu/ALMA - 7B - Pretrain - LoRA), -, -, [haoranxu/ALMA - 13B - Pretrain - LoRA](https://huggingface.co/haoranxu/ALMA - 13B - Pretrain - LoRA), -

⚠️ Important Note

Note that ALMA - 7B - Pretrain and ALMA - 13B - Pretrain are NOT translation models. They only experience stage 1 monolingual fine - tuning (20B tokens for the 7B model and 12B tokens for the 13B model), and should be utilized in conjunction with their LoRA models.

Datasets

Datasets used by ALMA and ALMA - R are also released on Hugging Face (NEW!):

Property	Details
Train / Validation Data	[train and validation](https://huggingface.co/datasets/haoranxu/ALMA - Human - Parallel), [train](https://huggingface.co/datasets/haoranxu/ALMA - R - Preference)
Test Data	[WMT'22](https://huggingface.co/datasets/haoranxu/WMT22 - Test), [WMT'22](https://huggingface.co/datasets/haoranxu/WMT22 - Test) and [WMT'23](https://huggingface.co/datasets/haoranxu/WMT23 - Test)

💻 Usage Examples

Basic Usage

Here is a quick start to use system ALMA - 13B - LoRA for translation. An example of translating "我爱机器翻译。" into English:

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM
from transformers import LlamaTokenizer

# Load base model and LoRA weights
model = AutoModelForCausalLM.from_pretrained("haoranxu/ALMA-13B-Pretrain", torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, "haoranxu/ALMA-13B-Pretrain-LoRA")
tokenizer = LlamaTokenizer.from_pretrained("haoranxu/ALMA-13B-Pretrain", padding_side='left')

# Add the source setence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()

# Translation
with torch.no_grad():
    generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)

📚 Documentation

Please find more details in our GitHub repository

📄 License

This project is under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご