Boana-7B-Instruct Open-Source Model - Providing Low-Complexity Language Support for Portuguese Users

Boana 7b Instruct

Developed by lrds-code

Boana-7B-Instruct is a Portuguese instruction fine-tuned model based on LLaMA2-7B, specifically designed for Portuguese-speaking users, offering a lower-complexity LLM option.

Large Language Model

Transformers

Other#Portuguese generation #Low-resource optimization #LLaMA2 fine-tuning

Downloads 24

Release Time : 1/20/2024

Model Overview

This model is based on the LLaMA2-7B architecture and fine-tuned for Portuguese, aiming to provide efficient text generation capabilities for users with limited computational resources.

Model Features

Portuguese optimization

Specially fine-tuned for Portuguese, providing more natural language generation capabilities.

Low-resource requirements

Based on the 7-billion-parameter LLaMA2-7B, suitable for users with limited computational resources.

Instruction following

Supports instruction following, capable of generating relevant text based on user instructions.

Model Capabilities

Text generation

Instruction following

Portuguese understanding and generation

Use Cases

Education

Language learning assistance

Helps students practice Portuguese writing and reading comprehension.

Content creation

Portuguese content generation

Generates blog posts, stories, or other creative content in Portuguese.

🚀 Boana-7B-Instruct

Boana-7B-Instruct is a Large Language Model (LLM) trained on Portuguese language data. It aims to provide Portuguese-language LLM options and is less computationally intensive, enabling users with limited computing power to utilize LLMs.

🚀 Quick Start

✨ Features

Based on LLaMA2-7B: Boana-7B-Instruct is built upon the LLaMA2-7B, a 7B parameter version of LLaMA-2.
Portuguese Language Focus: Trained on Portuguese language data to better serve Portuguese-speaking users.
Lower Computational Requirements: Designed to be accessible to users with less powerful computing resources.

Boana Logo

In support of Portuguese-speaking countries.

Countries Logo

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

import torch
from transformers import pipeline

boana = pipeline('text-generation', model='lrds-code/boana-7b-instruct', torch_dtype=torch.bfloat16, device_map='auto')

messages = [{'role':'system',
             'content':''},
            {'role':'user',
             'content':'Quantos planetas existem no sistema solar?'}]

prompt = boana.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = boana(prompt, max_new_tokens=256, do_sample=False, temperature=0, top_k=50, top_p=0.95)
print(outputs[0]['generated_text'])

# <s>[INST] <<SYS>>

# <</SYS>>

# Quantos planetas existem no sistema solar? [/INST]  O sistema solar consiste em 8 planetas:

# 1. Mercurio
# 2. Vênus
# 3. Terra
# 4. Marte
# 5. Júpiter
# 6. Saturno
# 8. Netuno

# Além desses planetas, o sistema solar também inclui outros corpos celestes, como asteroides, cometas e anões, bem como várias luas e satélites naturais

📚 Documentation

Model Description

Property	Details
Developed by	Leonardo Souza
Model Type	LLaMA-Based
License	Academic Free License v3.0
Fine-tuned from	LLaMA2-7B

Important Parameters

repetition_penalty: Used to avoid the repetition of words or phrases. When this value is set greater than 1, the model tries to reduce the probability of generating words that have appeared before. Generally, the larger the value, the more the model tries to avoid repetitions.
do_sample: Determines whether the model should randomly sample the next word based on the calculated probabilities. do_sample=True introduces variation and unpredictability in the generated text, while do_sample=False makes the model always choose the most probable word as the next word, which may lead to more deterministic and possibly more repetitive outputs.
temperature: Affects the randomness in the choice of the next word. A low value (close to 0) makes the model more "confident" in its choices, favoring high-probability words and leading to more predictable outputs. On the other hand, a high value increases randomness, allowing the model to choose less probable words, which can make the generated text more varied and creative.

🔧 Technical Details

The model boana-7b-instruct is a fine-tuned version of LLaMA2-7B. It has been trained on Portuguese language data to enhance its performance in Portuguese text generation.

📄 License

The model is released under the Academic Free License v3.0.

Model Performance

Task	Dataset	Metric	Value
Text Generation	Muennighoff/xwinograd (pt, test split)	Accuracy	50.57

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご