Gervásio 7B PTPT Open-source Portuguese Decoder - Free to Use and Compatible with European Portuguese Variants

Gervasio 7b Portuguese Ptpt Decoder

Developed by PORTULAN

Gervásio 7B PTPT is a fully open Portuguese decoder, developed based on the LLaMA-2 7B model, optimized for the European Portuguese variant.

Large Language Model

Transformers

OtherOpen Source License:MIT #Portuguese text generation #European Portuguese optimization #Instruction fine-tuning

Downloads 2,683

Release Time : 11/15/2023

Model Overview

Gervásio 7B PTPT is a Portuguese decoder based on the Transformer architecture, specifically trained for European Portuguese. It supports text generation tasks and is distributed freely under an open license, suitable for both research and commercial use.

Model Features

Open license

Freely distributed under the MIT license, supporting both research and commercial use.

Portuguese optimization

Specifically trained and optimized for European Portuguese.

Consumer-grade hardware support

Due to its moderate size, it can run on consumer-grade hardware.

Instruction fine-tuning

Fine-tuned using Portuguese instruction datasets, enhancing task execution capabilities.

Model Capabilities

Portuguese text generation

Paraphrase detection

Textual entailment recognition

Semantic textual similarity

Coreference and natural language inference

Question answering

Use Cases

Natural language processing

Text generation

Generate text content that conforms to European Portuguese conventions.

Question answering systems

Build Portuguese-based question answering systems.

Text similarity analysis

Evaluate semantic similarity between Portuguese texts.

F1 score of 0.7273 on the MRPC task

Education

Language learning assistance

Help learners understand and generate Portuguese text.

🚀 Gervásio 7B PTPT

Gervásio 7B PTPT is a fully open decoder for the Portuguese language, based on the LLaMA-2 7B model and trained on specific Portuguese language resources. It has different versions for European and Brazilian Portuguese, and can be run on consumer - grade hardware.

✨ Features

Fully Open: Gervásio PT* is a fully open decoder for the Portuguese language, openly distributed for free under an open license for both research and commercial purposes.
Variant - Specific: It has different versions trained for European Portuguese (PTPT) and Brazilian Portuguese (PTBR).
Hardware - Friendly: Given its size, it can be run on consumer - grade hardware.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

>>> from transformers import pipeline
>>> generator = pipeline(model='PORTULAN/gervasio-7b-portuguese-ptpt-decoder')
>>> generator("A comida portuguesa é", max_new_tokens=10)

📚 Documentation

Model Description

This model card is for Gervásio 7B PTPT, with 7 billion parameters, a hidden size of 4,096 units, an intermediate size of 11,008 units, 32 attention heads, 32 hidden layers, and a tokenizer obtained using the Byte - Pair Encoding (BPE) algorithm implemented with SentencePiece, featuring a vocabulary size of 32,000. Gervásio 7B PTPT is distributed under an MIT license.

Training Data

Gervásio 7B PTPT was trained over standard supervised fine - tuning. To keep some alignment with mainstream benchmarks for English, tasks and respective datasets in the GLUE and the SuperGLUE collections were used. The selected datasets were machine - translated into European Portuguese from extraGLUE. Instruction templates were manually crafted for each task and collected into the [extraGLUE - instruct](https://huggingface.co/datasets/PORTULAN/extraglue - instruct) dataset. Data augmentation techniques were also employed to enhance the dataset's size and diversity.

Training Details

We applied supervised fine - tuning with a causal language modeling training objective following a zero - out technique during the fine - tuning process. Only the response tokens were subjected to back - propagation. The model was trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two - epoch training regime without warm - up. An input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps was used. Due to hardware limitations, each training example was separated individually.

Performance

For testing, the translated datasets MRPC (similarity), RTE (inference) from GLUE, and COPA (reasoning/qa) from SuperGLUE were used.

Model	MRPC (F1)	RTE (F1)	COPA (F1)
Gervásio 7B PTPT	0.7273	0.8291	0.5459
LLaMA - 2 (English)	0.0328	0.0482	0.3844
LLaMA - 2 Chat (English)	0.5703	0.4697	0.4737

Acknowledgments

The research reported here was partially supported by: PORTULAN CLARIN—Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT—Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; research project GPT - PT - Transformer - based Decoder for the Portuguese Language, funded by FCT—Fundação para a Ciência e Tecnologia under the grant CPCA - IAC/AV/478395/2022; innovation project ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação under the grant C625734525 - 00462629, of Plano de Recuperação e Resiliência, call RE - C05 - i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização.

🔧 Technical Details

The model is a decoder of the LLaMA family, based on the neural architecture Transformer and developed over the LLaMA - 2 7B model. During fine - tuning, a zero - out technique was applied in the causal language modeling training objective, with only response tokens subjected to back - propagation. Hyper - parameters include a learning rate of 2 * 10^-5, a weight decay of 0.1, a two - epoch training regime without warm - up, an input sequence of 512 tokens, a batch size of 16, and 16 accumulation steps.

📄 License

Gervásio 7B PTPT is distributed under an MIT license.

Citation

@misc{gervasio,
      title={Advancing Generative AI for Portuguese with
             Open Decoder Gervásio PT-*}, 
      author={Rodrigo Santos, João Silva, Luís Gomes,
              João Rodrigues, António Branco},
      year={2024},
      eprint={2402.18766},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Please use the above canonical reference when using or citing this model.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご