Mambarim-110m Open-Source Portuguese Large Language Model - Free Deployment to Initiate a New Portuguese Communication Experience

Mambarim 110m

Developed by dominguesm

The first Portuguese large language model based on the Mamba architecture, utilizing state space models instead of traditional Transformer architecture

Large Language Model

Transformers

Other#Portuguese generation #Mamba architecture #Small parameter model

Downloads 33

Release Time : 3/11/2024

Model Overview

This is a language model specifically optimized for Portuguese, featuring an innovative Mamba architecture design, suitable for Portuguese text generation tasks

Model Features

Innovative Architecture

Utilizes Mamba state space model architecture instead of traditional Transformers, offering advantages in long sequence processing

Portuguese Optimization

Specifically trained for Portuguese, pre-trained on the Pt-Corpus Instruct dataset

Efficient Inference

Compared to traditional Transformer architectures, the Mamba architecture may offer higher efficiency during inference

Model Capabilities

Portuguese text generation

Long text processing

Instruction understanding and response

Use Cases

Content creation

Story continuation

Continue writing a story based on a given beginning

Can generate contextually appropriate Portuguese story content

Holiday content generation

Generate content related to specific holidays

Such as Christmas greetings, holiday descriptions, etc.

Educational assistance

Language learning

Generate Portuguese learning materials or example sentences

🚀 Mambarim-110M

The first Portuguese language model based on a state - space model architecture (Mamba), not a transformer.

🚀 Quick Start

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

✨ Features

Mambarim-110M is the first Portuguese language model based on a state - space model architecture (Mamba), not a transformer.

📚 Documentation

Details

Architecture: a Mamba model pre - trained via causal language modeling
Size: 119,930,880 parameters
Context length: 2048 tokens
Dataset: Pt - Corpus Instruct (6.2B tokens)
Language: Portuguese
Number of steps: 758,423

This repository has the source code used to train this model.

Intended Uses

WIP

Out - of - scope Use

WIP

💻 Usage Examples

Basic Usage

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]

📊 Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).

Detailed results can be found here

Property	Details
library_name	transformers
language	Portuguese
license	cc - by - 4.0
tags	text - generation, pytorch, LLM, Portuguese, mamba
datasets	Pt - Corpus Instruct
inference.parameters.repetition_penalty	1.2
inference.parameters.temperature	0.8
inference.parameters.top_k	50
inference.parameters.top_p	0.85
inference.parameters.max_new_tokens	150
pipeline_tag	text - generation

Model	Average	ENEM	BLUEX	OAB Exams	ASSIN2 RTE	ASSIN2 STS	FAQNAD NLI	HateBR	PT Hate Speech	tweetSentBR	Architecture
TeenyTinyLlama-460m	28.86	20.15	25.73	27.02	53.61	13	46.41	33.59	22.99	17.28	LlamaForCausalLM
TeenyTinyLlama-160m	28.2	19.24	23.09	22.37	53.97	0.24	43.97	36.92	42.63	11.39	LlamaForCausalLM
MulaBR/Mula-4x160-v0.1	26.24	21.34	25.17	25.06	33.57	11.35	43.97	41.5	22.99	11.24	MixtralForCausalLM
TeenyTinyLlama-460m-Chat	25.49	20.29	25.45	26.74	43.77	4.52	34	33.49	22.99	18.13	LlamaForCausalLM
manbarim-110m	14.16	18.4	10.57	21.87	16.09	1.89	9.29	15.75	17.77	15.79	MambaForCausalLM
GloriaTA-3B	4.09	1.89	3.2	5.19	0	2.32	0.26	0.28	23.52	0.19	GPTNeoForCausalLM

📄 License

The model is released under the cc - by - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご