Caramelinho Open-Source Bilingual Text Generation Model - Free Deployment for Portuguese-English Text Creation

Caramelinho

Developed by Bruno

A bilingual (Portuguese/English) text generation model fine-tuned from Falcon-7B using QLoRA method

Large Language Model Supports Multiple Languages#Portuguese instruction fine-tuning #QLoRA efficient fine-tuning #Falcon-7B foundation

Downloads 22

Release Time : 6/9/2023

Model Overview

This adapter fine-tunes Falcon-7B with the Canarim instruction dataset, supporting text generation tasks in both Portuguese and English

Model Features

Bilingual support

Optimized specifically for Portuguese and English, suitable for bilingual text generation scenarios

Efficient fine-tuning

Employs QLoRA parameter-efficient fine-tuning method, reducing computational resource requirements while maintaining original model performance

Instruction following

Fine-tuned with instruction datasets for better understanding and execution of user commands

Model Capabilities

Text generation

Instruction response

Multi-turn dialogue

Content creation

Use Cases

Education

Language learning assistant

Helps Portuguese learners with bilingual practice and explanations

Can generate accurate grammar explanations and translation examples

Content creation

Bilingual content generation

Automatically generates marketing copy or social media content in Portuguese and English

Maintains brand voice consistency while improving content production efficiency

🚀 Caramelinho

This project fine - tunes the base model Falcon - 7b using the QLoRA method on the Canarim dataset. It provides a practical solution for text generation tasks.

🚀 Quick Start

Prerequisites

Ensure you have the following frameworks installed:

Transformers 4.30.0.dev0
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

Usage Example

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig

peft_model_id = "Bruno/Caramelinho"

config = PeftConfig.from_pretrained(peft_model_id)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             return_dict=True,
                                             quantization_config=bnb_config, 
                                             trust_remote_code=True, 
                                             device_map={"": 0})
prompt_input = "Abaixo está uma declaração que descreve uma tarefa, juntamente com uma entrada que fornece mais contexto. Escreva uma resposta que conclua corretamente a solicitação.\n\n ### Instrução:\n{instruction}\n\n### Entrada:\n{input}\n\n### Resposta:\n"
prompt_no_input = "Abaixo está uma instrução que descreve uma tarefa. Escreva uma resposta que conclua corretamente a solicitação.\n\n### Instrução:\n{instruction}\n\n### Resposta:\n"

def create_prompt(instruction, input=None):
    if input:
        return prompt_input.format(instruction=instruction, input=input)
    else:
        return prompt_no_input.format(instruction=instruction)

def generate(
        instruction,
        input=None,
        max_new_tokens=128,
        temperature=0.1,
        top_p=0.75,
        top_k=40,
        num_beams=4,
        repetition_penalty=1.7,
        max_length=512
):
    prompt = create_prompt(instruction, input)
    inputs = tokenizer.encode_plus(prompt, return_tensors="pt", truncation=True, max_length=max_length, padding="longest")
    input_ids = inputs["input_ids"].to("cuda")
    attention_mask = inputs["attention_mask"].to("cuda")

    generation_output = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_length=max_length,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        num_beams=num_beams,
        repetition_penalty=repetition_penalty,
        length_penalty=0.8,
        early_stopping=True,
        output_scores=True,
        return_dict_in_generate=True
    )

    output = tokenizer.decode(generation_output.sequences[0], skip_special_tokens=True)
    return output.split("### Resposta:")[1]

instruction = "Descrever como funcionam os computadores quânticos."
print("Instrução:", instruction)
print("Resposta:", generate(instruction))

Output

Instrução: Descrever como funcionam os computadores quânticos.
Resposta: 
Os computadores quânticos são um tipo de computador cuja arquitetura é baseada na mecânica quântica. Os computadores quânticos são capazes de realizar operações matemáticas complexas em um curto espaço de tempo.

✨ Features

This adapter was created with the PEFT library, enabling the base model Falcon - 7b to be fine - tuned on the Canarim dataset using the QLoRA method.

📚 Documentation

Model description

Falcon 7B

Intended uses & limitations

TBA

Training and evaluation data

TBA

Training results

TBA

📦 Installation

The installation of relevant frameworks is required. You can install them using the following commands (not provided in the original text, here is a general example):

pip install transformers==4.30.0.dev0
pip install torch==2.0.1+cu118
pip install datasets==2.12.0
pip install tokenizers==0.13.3

📄 License

No license information provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご