Mambaoutai Open-Source Model - Free-to-Use Small Checkpoints Covering French, English and Code Data

Mambaoutai

Developed by lightonai

Mambaoutai is a series of small Mamba checkpoints, trained on data covering French, English, and code, intended for community exploration.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual generation #Efficient inference #Instruction fine-tuning

Downloads 29

Release Time : 3/18/2024

Model Overview

Mambaoutai is a small language model based on the Mamba architecture, supporting French and English text generation, and can be used for code generation and natural language processing tasks.

Model Features

Multilingual support

Supports text generation in both French and English

Efficient inference

With only 1.6 billion parameters, it can run at reasonable speeds on CPUs

Training checkpoints

Provides multiple checkpoints during training for research and analysis

On-device inference

Supports running on frameworks like llama.cpp, suitable for edge devices

Model Capabilities

Text generation

Code generation

Multilingual processing

Use Cases

Education

Language learning assistance

Generates learning materials in French or English

Content creation

Article continuation

Generates coherent text content based on prompts

Programming assistance

Code completion

Generates code snippets based on context

🚀 Mambaoutai 1.6B

Mambaoutai is the outcome of all the experiments and training runs detailed in the following blog post, where all the details about the model series are shared. It is a series of small mamba checkpoints released for the community to explore, trained on French, English, and code. We conducted two different decay phases with the WSD-scheduler and released model checkpoints pretrained both with and without instruction data.

🚀 Quick Start

📦 Installation

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal-conv1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm>=1.2.0

If any of these two is not installed, the "eager" implementation will be used (not recommended). Otherwise, the more optimized CUDA kernels will be used.

💻 Usage Examples

🔍 Basic Usage

Use this snippet of code to generate text from the model:

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

if model_has_instruct_data:
    # use chat tokens
    prompt = ”<start_user>Tell me something about Paris.<end_message><start_assistant>”
else:
    # prompt the non-instructed tuned model gently
    prompt = ”This is a text about Paris. Paris is”

tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

🔍 Advanced Usage

Training checkpoints

You can find some of the training checkpoints in the repo branch, on the branch corresponding to the model at some point in time during training. You can do inference with these training checkpoints by adding the revision parameter to the from_pretrained method. For example, to load the model checkpoint after 30000 steps of pretraining, you can use the following code:

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

On-device Inference

Since Mambaoutai is only 1.6B parameters, it can be run on a CPU with reasonable speed. Here is an example of how to run it on llama.cpp:

# Clone llama.cpp repository and compile it from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Create a venv and install dependencies
conda create -n mamba-cpp python=3.10
conda activate mamba-cpp
pip install -r requirements/requirements-convert-hf-to-gguf.txt

# Download the weights, tokenizer, config, tokenizer_config and special_tokens_map from this repo and
# put them in a directory 'Mambaoutai/' 
mkdir Mambaoutai

# Convert the weights to GGUF format
python convert-hf-to-gguf.py Mambaoutai

# Run inference with a prompt
./main -m Mambaoutai/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 1

🔧 Technical Details

Training Hardware

The model checkpoints with no instruction data have been fully trained on an NVIDIA DGX H100 provided by OVH Cloud, whereas the decay phases with instruction data have been carried out on an HPE Cray with 8xH100 on Orange Cloud Avenue. The ablation experiments were conducted on 16 nodes(4xA100 - 40GB) on MeluXina.

Model hyperparameters

More details about the model hyperparameters are given in the table below:

Property	Details
d_model	2688
n_layer	28
vocab_size	65024
context_len	4096
rms_norm	true
residual_in_fp32	true
fused_add_norm	true
conv_kernel	4
d_inner	5376
state_size	16
dtype	bfloat16
tie_word_embeddings	false
non embeddings params	1.27B

📄 License

The model is licensed under Apache-2.0.

Additional Information

Datasets: togethercomputer/RedPajama-Data-V2, stingning/ultrachat
Languages: fr, en
Metrics: accuracy, perplexity

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご