🚀 Model Card for TowerInstruct-7B-v0.1
TowerInstruct-7B-v0.1 is a fine - tuned language model designed to handle various translation - related tasks. It offers a wide range of capabilities in multiple languages, providing solutions for translation, post - edition, and more.
📚 Documentation
✨ Features
- TowerInstruct-7B is a 7B parameter model fine - tuned on TowerBase using the TowerBlocks supervised fine - tuning dataset.
- It can handle multiple translation - related tasks, including general machine translation, automatic post - edition, named - entity recognition, and more.
- Supports 10 languages: English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Chinese, and Russian.
📦 Installation
The installation steps are shown in the code example for running the model. You may need to install transformers
from source (for versions <= v4.34) and accelerate
.
💻 Usage Examples
Basic Usage
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="Unbabel/TowerInstruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{"role": "user", "content": "Translate the following text from Portuguese into English.\nPortuguese: Um grupo de investigadores lançou um novo modelo para tarefas relacionadas com tradução.\nEnglish:"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=False)
print(outputs[0]["generated_text"])
🔧 Technical Details
Model Description
TowerInstruct-7B is a language model that results from fine - tuning TowerBase on the TowerBlocks supervised fine - tuning dataset. TowerInstruct-7B-v0.1 is the first model in the series. The model is trained to handle several translation - related tasks, such as general machine translation (e.g., sentence - and paragraph - level translation, terminology - aware translation, context - aware translation), automatic post - edition, named - entity recognition, gramatical error correction, and paraphrase generation. More details will be released in the upcoming technical report.
Intended uses & limitations
The model was initially fine - tuned on a filtered and preprocessed supervised fine - tuning dataset (TowerBlocks), which contains a diverse range of data sources:
- Translation (sentence and paragraph - level)
- Automatic Post Edition
- Machine Translation Evaluation
- Context - aware Translation
- Terminology - aware Translation
- Multi - reference Translation
- Named - entity Recognition
- Paraphrase Generation
- Synthetic Chat data
- Code instructions
The model is not guaranteed to perform for languages other than the 10 languages it supports. It is not intended to be used as a conversational chatbot or code assistant, nor as a document - level translator.
Prompt Format
TowerInstruct-v0.1 was trained using the ChatML prompt templates without any system prompts. An example follows below:
<|im_start|>user
{USER PROMPT}<|im_end|>
<|im_start|>assistant
{MODEL RESPONSE}<|im_end|>
<|im_start|>user
[...]
The prompts for all supervised tasks can be found in TowerBlocks. Multiple prompt templates were used for each task, and the difference in downstream performance should be minimal.
Training Details
Training Data
The training data can be found at the link to TowerBlocks.
Training Hyperparameters
The following hyperparameters were used during training:
Property |
Details |
total_train_batch_size |
256 |
learning_rate |
7e - 06 |
lr_scheduler_type |
cosine |
lr_scheduler_warmup_steps |
500 |
weight_decay |
0.01 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon = 1e - 08 |
num_epochs |
4 |
max_seq_length |
2048 |
📄 License
The model is licensed under CC - BY - NC - 4.0. Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
📖 Citation
@misc{tower_llm_2024,
title={Tower: An Open Multilingual Large Language Model for Translation-Related Tasks},
author={Duarte M. Alves and José Pombal and Nuno M. Guerreiro and Pedro H. Martins and João Alves and Amin Farajian and Ben Peters and Ricardo Rei and Patrick Fernandes and Sweta Agrawal and Pierre Colombo and José G. C. de Souza and André F. T. Martins},
year={2024},
eprint={2402.17733},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
