🚀 Mambarim-110M
The first Portuguese language model based on a state - space model architecture (Mamba), not a transformer.
🚀 Quick Start
You need to install transformers
from main
until transformers=4.39.0
is released.
pip install git+https://github.com/huggingface/transformers@main
We also recommend you to install both causal_conv_1d
and mamba-ssm
using:
pip install causal-conv1d>=1.2.0
pip install mamba-ssm
✨ Features
Mambarim-110M is the first Portuguese language model based on a state - space model architecture (Mamba), not a transformer.
📚 Documentation
Details
- Architecture: a Mamba model pre - trained via causal language modeling
- Size: 119,930,880 parameters
- Context length: 2048 tokens
- Dataset: Pt - Corpus Instruct (6.2B tokens)
- Language: Portuguese
- Number of steps: 758,423
This repository has the source code used to train this model.
Intended Uses
WIP
Out - of - scope Use
WIP
💻 Usage Examples
Basic Usage
You can use the classic generate
API:
>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
input_ids,
repetition_penalty=1.2,
temperature=0.8,
top_k=50,
top_p=0.85,
do_sample=True,
max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]
📊 Benchmarks
Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).
Detailed results can be found here
Property |
Details |
library_name |
transformers |
language |
Portuguese |
license |
cc - by - 4.0 |
tags |
text - generation, pytorch, LLM, Portuguese, mamba |
datasets |
Pt - Corpus Instruct |
inference.parameters.repetition_penalty |
1.2 |
inference.parameters.temperature |
0.8 |
inference.parameters.top_k |
50 |
inference.parameters.top_p |
0.85 |
inference.parameters.max_new_tokens |
150 |
pipeline_tag |
text - generation |
Model |
Average |
ENEM |
BLUEX |
OAB Exams |
ASSIN2 RTE |
ASSIN2 STS |
FAQNAD NLI |
HateBR |
PT Hate Speech |
tweetSentBR |
Architecture |
TeenyTinyLlama-460m |
28.86 |
20.15 |
25.73 |
27.02 |
53.61 |
13 |
46.41 |
33.59 |
22.99 |
17.28 |
LlamaForCausalLM |
TeenyTinyLlama-160m |
28.2 |
19.24 |
23.09 |
22.37 |
53.97 |
0.24 |
43.97 |
36.92 |
42.63 |
11.39 |
LlamaForCausalLM |
MulaBR/Mula-4x160-v0.1 |
26.24 |
21.34 |
25.17 |
25.06 |
33.57 |
11.35 |
43.97 |
41.5 |
22.99 |
11.24 |
MixtralForCausalLM |
TeenyTinyLlama-460m-Chat |
25.49 |
20.29 |
25.45 |
26.74 |
43.77 |
4.52 |
34 |
33.49 |
22.99 |
18.13 |
LlamaForCausalLM |
manbarim-110m |
14.16 |
18.4 |
10.57 |
21.87 |
16.09 |
1.89 |
9.29 |
15.75 |
17.77 |
15.79 |
MambaForCausalLM |
GloriaTA-3B |
4.09 |
1.89 |
3.2 |
5.19 |
0 |
2.32 |
0.26 |
0.28 |
23.52 |
0.19 |
GPTNeoForCausalLM |
📄 License
The model is released under the cc - by - 4.0 license.