🚀 Gervásio 7B PTPT
Gervásio 7B PTPT is a fully open decoder for the Portuguese language, based on the LLaMA-2 7B model and trained on specific Portuguese language resources. It has different versions for European and Brazilian Portuguese, and can be run on consumer - grade hardware.
✨ Features
- Fully Open: Gervásio PT* is a fully open decoder for the Portuguese language, openly distributed for free under an open license for both research and commercial purposes.
- Variant - Specific: It has different versions trained for European Portuguese (PTPT) and Brazilian Portuguese (PTBR).
- Hardware - Friendly: Given its size, it can be run on consumer - grade hardware.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
>>> from transformers import pipeline
>>> generator = pipeline(model='PORTULAN/gervasio-7b-portuguese-ptpt-decoder')
>>> generator("A comida portuguesa é", max_new_tokens=10)
📚 Documentation
Model Description
This model card is for Gervásio 7B PTPT, with 7 billion parameters, a hidden size of 4,096 units, an intermediate size of 11,008 units, 32 attention heads, 32 hidden layers, and a tokenizer obtained using the Byte - Pair Encoding (BPE) algorithm implemented with SentencePiece, featuring a vocabulary size of 32,000. Gervásio 7B PTPT is distributed under an MIT license.
Training Data
Gervásio 7B PTPT was trained over standard supervised fine - tuning. To keep some alignment with mainstream benchmarks for English, tasks and respective datasets in the GLUE and the SuperGLUE collections were used. The selected datasets were machine - translated into European Portuguese from extraGLUE. Instruction templates were manually crafted for each task and collected into the [extraGLUE - instruct](https://huggingface.co/datasets/PORTULAN/extraglue - instruct) dataset. Data augmentation techniques were also employed to enhance the dataset's size and diversity.
Training Details
We applied supervised fine - tuning with a causal language modeling training objective following a zero - out technique during the fine - tuning process. Only the response tokens were subjected to back - propagation. The model was trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two - epoch training regime without warm - up. An input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps was used. Due to hardware limitations, each training example was separated individually.
Performance
For testing, the translated datasets MRPC (similarity), RTE (inference) from GLUE, and COPA (reasoning/qa) from SuperGLUE were used.
Model |
MRPC (F1) |
RTE (F1) |
COPA (F1) |
Gervásio 7B PTPT |
0.7273 |
0.8291 |
0.5459 |
LLaMA - 2 (English) |
0.0328 |
0.0482 |
0.3844 |
LLaMA - 2 Chat (English) |
0.5703 |
0.4697 |
0.4737 |
Acknowledgments
The research reported here was partially supported by: PORTULAN CLARIN—Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT—Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; research project GPT - PT - Transformer - based Decoder for the Portuguese Language, funded by FCT—Fundação para a Ciência e Tecnologia under the grant CPCA - IAC/AV/478395/2022; innovation project ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação under the grant C625734525 - 00462629, of Plano de Recuperação e Resiliência, call RE - C05 - i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização.
🔧 Technical Details
The model is a decoder of the LLaMA family, based on the neural architecture Transformer and developed over the LLaMA - 2 7B model. During fine - tuning, a zero - out technique was applied in the causal language modeling training objective, with only response tokens subjected to back - propagation. Hyper - parameters include a learning rate of 2 * 10^-5, a weight decay of 0.1, a two - epoch training regime without warm - up, an input sequence of 512 tokens, a batch size of 16, and 16 accumulation steps.
📄 License
Gervásio 7B PTPT is distributed under an MIT license.
Citation
@misc{gervasio,
title={Advancing Generative AI for Portuguese with
Open Decoder Gervásio PT-*},
author={Rodrigo Santos, João Silva, Luís Gomes,
João Rodrigues, António Branco},
year={2024},
eprint={2402.18766},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Please use the above canonical reference when using or citing this model.