๐ Mambaoutai 1.6B
Mambaoutai is the outcome of all the experiments and training runs detailed in the following blog post, where all the details about the model series are shared. It is a series of small mamba checkpoints released for the community to explore, trained on French, English, and code. We conducted two different decay phases with the WSD-scheduler and released model checkpoints pretrained both with and without instruction data.
๐ Quick Start
๐ฆ Installation
You need to install transformers
from main
until transformers=4.39.0
is released.
pip install git+https://github.com/huggingface/transformers@main
We also recommend you to install both causal-conv1d
and mamba-ssm
using:
pip install causal-conv1d>=1.2.0
pip install mamba-ssm>=1.2.0
If any of these two is not installed, the "eager" implementation will be used (not recommended). Otherwise, the more optimized CUDA
kernels will be used.
๐ป Usage Examples
๐ Basic Usage
Use this snippet of code to generate text from the model:
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
if model_has_instruct_data:
prompt = โ<start_user>Tell me something about Paris.<end_message><start_assistant>โ
else:
prompt = โThis is a text about Paris. Paris isโ
tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
๐ Advanced Usage
Training checkpoints
You can find some of the training checkpoints in the repo branch, on the branch corresponding to the model at some point in time during training.
You can do inference with these training checkpoints by adding the revision
parameter to the from_pretrained
method.
For example, to load the model checkpoint after 30000 steps of pretraining, you can use the following code:
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
On-device Inference
Since Mambaoutai is only 1.6B parameters, it can be run on a CPU with reasonable speed.
Here is an example of how to run it on llama.cpp:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
conda create -n mamba-cpp python=3.10
conda activate mamba-cpp
pip install -r requirements/requirements-convert-hf-to-gguf.txt
mkdir Mambaoutai
python convert-hf-to-gguf.py Mambaoutai
./main -m Mambaoutai/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 1
๐ง Technical Details
Training Hardware
The model checkpoints with no instruction data have been fully trained on an NVIDIA DGX H100 provided by OVH Cloud, whereas the decay phases with instruction data have been carried out on an HPE Cray with 8xH100 on Orange Cloud Avenue.
The ablation experiments were conducted on 16 nodes(4xA100 - 40GB) on MeluXina.
Model hyperparameters
More details about the model hyperparameters are given in the table below:
Property |
Details |
d_model |
2688 |
n_layer |
28 |
vocab_size |
65024 |
context_len |
4096 |
rms_norm |
true |
residual_in_fp32 |
true |
fused_add_norm |
true |
conv_kernel |
4 |
d_inner |
5376 |
state_size |
16 |
dtype |
bfloat16 |
tie_word_embeddings |
false |
non embeddings params |
1.27B |
๐ License
The model is licensed under Apache-2.0.
Additional Information
- Datasets: togethercomputer/RedPajama-Data-V2, stingning/ultrachat
- Languages: fr, en
- Metrics: accuracy, perplexity