Mamba-370M-HF Open-Source Language Model - Free Deployment for Efficient Sequence Modeling

Mamba 370m Hf

Developed by state-spaces

Mamba is an efficient language model based on the State Space Model (SSM), with the ability to model sequences with linear time complexity.

Large Language Model

Transformers

#Efficient text generation #Lightweight fine-tuning #Low-latency inference

Downloads 6,895

Release Time : 3/6/2024

Model Overview

Mamba is a language model compatible with HuggingFace Transformers, adopting an innovative state space architecture, which is particularly suitable for long sequence processing tasks.

Model Features

Efficient sequence modeling

Adopting the state space architecture, it has the ability to process sequences with linear time complexity

CUDA optimization

Supports optimized CUDA kernel implementation to improve inference efficiency

Compatible with Transformers

Fully compatible with the HuggingFace Transformers ecosystem

Model Capabilities

Text generation

Language modeling

Long sequence processing

Use Cases

Text generation

Dialogue generation

Generate coherent dialogue responses

The example demonstrates the ability to continue the dialogue smoothly

Content creation

Assist in writing and creative content generation

🚀 Mamba

This repository houses the mamba-2.8b model compatible with the transformers library. The checkpoints remain unaltered, while the complete config.json and tokenizer have been uploaded to this repository.

🚀 Quick Start

To use this model, you need to perform the following installations.

📦 Installation

First, you need to install the transformers library from the main branch until transformers=4.39.0 is officially released:

pip install git+https://github.com/huggingface/transformers@main

We also suggest installing both causal_conv_1d and mamba-ssm using the following commands:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

If either of these two libraries is not installed, the "eager" implementation will be used. Otherwise, the more optimized cuda kernels will be employed.

💻 Usage Examples

Basic Usage

You can utilize the classic generate API for text generation:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-370m-hf")
>>> model = MambaForCausalLM.from_pretrained("state-spaces/mamba-370m-hf")
>>> input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"]

>>> out = model.generate(input_ids, max_new_tokens=10)
>>> print(tokenizer.batch_decode(out))
["Hey how are you doing?\n\nI'm doing great.\n\nI"]

Advanced Usage

Here is an example of fine - tuning the model using the peft library. Note that we recommend keeping the model in float32 during fine - tuning:

from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-370m-hf")
model = AutoModelForCausalLM.from_pretrained("state-spaces/mamba-370m-hf")
dataset = load_dataset("Abirate/english_quotes", split="train")
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    logging_dir='./logs',
    logging_steps=10,
    learning_rate=2e-3
)
lora_config =  LoraConfig(
        r=8,
        target_modules=["x_proj", "embeddings", "in_proj", "out_proj"],
        task_type="CAUSAL_LM",
        bias="none"
)
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
    train_dataset=dataset,
    dataset_text_field="quote",
)
trainer.train()

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご