Gervásio 8B PTPT Open-source Model - Free Deployment, Powerful Text Generation Specifically for Portuguese

Gervasio 8b Portuguese Ptpt Decoder

Developed by PORTULAN

Gervásio 8B PTPT is an open-source decoder model for Portuguese, fine-tuned based on LLaMA 3.1 8B Instruct, with powerful text generation capabilities.

Large Language Model

Transformers

OtherOpen Source License:MIT #European Portuguese generation #Consumer-grade hardware adaptation #Instruction fine-tuning optimization

Downloads 105

Release Time : 6/11/2025

Model Overview

This model is developed by a team from the University of Lisbon in Portugal, focusing on European Portuguese text generation tasks and suitable for research and commercial use.

Model Features

Open source and free

Distributed under an open license, allowing research and commercial use

Hardware-friendly

The 8B parameter scale can run on consumer-grade hardware

Trained on multiple datasets

Trained using multiple Portuguese datasets, including extraGLUE-Instruct, etc.

Superior performance

Outperforms the base model LLaMA 3.1 8B Instruct in multiple benchmark tests

Model Capabilities

Portuguese text generation

Instruction following

Multiple-choice question answering

Use Cases

Education

Language learning assistance

Generate Portuguese learning materials and practice questions

Content creation

Portuguese content generation

Automatically generate grammatically correct Portuguese text

🚀 Gervásio 8B PTPT

Gervásio 8B PTPT is an open decoder for the Portuguese language, based on the LLaMA 3.1 8B Instruct model. It can be run on consumer - grade hardware and is suitable for both research and commercial use.

✨ Features

Open Decoder: An open - source decoder designed specifically for the Portuguese language.
Based on LLaMA Family: Built on the LLaMA 3.1 8B Instruct model with the Transformer neural architecture.
Trained on Diverse Data: Trained on various Portuguese language datasets, including [extraGLUE - Instruct](https://huggingface.co/datasets/PORTULAN/extraglue - instruct) and others.
Open Distribution: Freely distributed under an open license for research and commercial purposes.
Runs on Consumer - Grade Hardware: Due to its size, it can be run on common consumer - grade hardware.

📦 Installation

The README doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

# Use the model directly with a pipeline for causal language modeling
>>> from transformers import pipeline
>>> generator = pipeline(model='PORTULAN/gervasio - 8b - portuguese - ptpt - decoder')
>>> generator("A comida portuguesa é", max_new_tokens=10)

📚 Documentation

Chatbot

This model is integrated in the chatbot Evaristo.ai, where its generative capabilities can be experimented with on the fly through a GUI.

Model Description

The model has 8 billion parameters, over 32 layers, with a hidden size of 4096, an intermediate size of 14336, and 32 attention heads. It uses a RoPE tokenizer with a vocabulary of size 128256.

Training Data

Gervásio 8B PTPT was trained on various datasets, either native to European Portuguese or translated into European Portuguese. For the latter, only those datasets where the outcome of their translation into European Portuguese could preserve the linguistic properties in the target language were selected.

The training data comprises:

[extraGLUE - Instruct](https://huggingface.co/datasets/PORTULAN/extraglue - instruct)
MMLU PT (multiple choice question answering).
A subset of Natural Instructions (mostly multiple choice question answering tasks).
A manually curated subset of Wikipedia.
A manually curated list of proverbs.

Training Details

We applied supervised fine - tuning with a causal language modeling training objective following a zero - out technique during the fine - tuning process. Specifically, while the entire prompt and chat template received attention during fine - tuning, only the response tokens were subjected to back - propagation.

To accelerate training, the Fully Sharded Data Parallel (FSDP) paradigm was used over 10 L40S GPUs.

Performance

For testing, we use translations of the standard benchmarks GPQA Diamond, MMLU and MMLU Pro, as well as the CoPA, MRPC and RTE datasets in extraGLUE.

Property	Details
Model Type	Gervásio 8B PTPT
Training Data	[extraGLUE - Instruct](https://huggingface.co/datasets/PORTULAN/extraglue - instruct), MMLU PT, a subset of Natural Instructions, a manually curated subset of Wikipedia, a manually curated list of proverbs

Model	GPQA Diamond PT	MMLU PT	MMLU Pro PT	CoPA	MRPC	RTE	Average
Gervásio 8B PTPT	34.85	62.15	36.79	87.00	77.45	77.62	62.64
LLaMA 3.1 8B Instruct	32.32	61.49	36.10	83.00	75.25	79.42	61.26

🔧 Technical Details

Neural Architecture: Based on the Transformer neural architecture.
Fine - Tuning Technique: Supervised fine - tuning with a causal language modeling training objective using a zero - out technique. Only response tokens are back - propagated during fine - tuning.
Acceleration Method: The Fully Sharded Data Parallel (FSDP) paradigm was used over 10 L40S GPUs for training acceleration.

📄 License

This model is distributed under the MIT license.

@misc{gervasio,
      title={Advancing Generative AI for Portuguese with
             Open Decoder Gervásio PT - *}, 
      author={Rodrigo Santos, João Silva, Luís Gomes,
              João Rodrigues, António Branco},
      year={2024},
      eprint={2402.18766},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Please use the above canonical reference when using or citing this model.

Acknowledgments

The research reported here was partially supported by: PORTULAN CLARIN—Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT—Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; innovation project ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação I.P. under the grant C625734525 - 00462629, of Plano de Recuperação e Resiliência, call RE - C05 - i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização; research project "Hey, Hal, curb your hallucination! / Enhancing AI chatbots with enhanced RAG solutions", funded by FCT - Fundação para a Ciência e a Tecnologia under the grant 2024.07592.IACDC; project "CLARIN – Infraestrutura de Investigação para a Ciência e Tecnologia da Linguagem", funded by programme Lisboa2030 under the grant LISBOA2030 - FEDER - 01316900PORTULAN.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご