The open-source model mamba-2.8b-instruct-openhermes - Efficiently accomplish various text generation tasks

Mamba 2.8b Instruct Openhermes

Developed by clibrain

This is a 2.8 billion parameter MAMBA model fine-tuned on the OpenHermes dataset, featuring a novel state space architecture and excelling in text generation tasks.

Large Language Model

Transformers

English#Instruction-tuned model #State space architecture #Multi-domain instruction response

Downloads 202

Release Time : 12/12/2023

Model Overview

This model is an instruction-tuned variant based on the MAMBA architecture, specifically optimized for text generation tasks and trained using the OpenHermes dataset.

Model Features

Efficient architecture

Utilizes MAMBA state space model architecture, offering higher computational efficiency compared to traditional Transformers

High-quality training data

Trained on OpenHermes dataset containing 242,000 high-quality GPT-4 generated instruction data points

Open-source license

Released under WTFPL license, allowing free use and modification

Model Capabilities

Text generation

Instruction following

Dialogue generation

Use Cases

Travel recommendations

Tourist attraction recommendations

Generates travel attraction recommendations based on user requests

Can produce detailed recommendation lists containing 5 attractions

General Q&A

Knowledge Q&A

Answers various knowledge-based questions from users

Capable of providing accurate and detailed responses

🚀 MAMBA (2.8B) 🐍 fine-tuned on OpenHermes

This project is a fine-tuned version of MAMBA (2.8B) on the OpenHermes dataset, aiming to enhance text generation capabilities.

🚀 Quick Start

Installation

pip install torch==2.1.0 transformers==4.35.0 causal-conv1d==1.0.0 mamba-ssm==1.0.1

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

CHAT_TEMPLATE_ID = "HuggingFaceH4/zephyr-7b-beta"

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model_name = "clibrain/mamba-2.8b-instruct-openhermes"

eos_token = "<|endoftext|>"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.eos_token = eos_token
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = AutoTokenizer.from_pretrained(CHAT_TEMPLATE_ID).chat_template

model = MambaLMHeadModel.from_pretrained(
        model_name, device=device, dtype=torch.float16)

messages = []
prompt = "Tell me 5 sites to visit in Spain"
messages.append(dict(role="user", content=prompt))

input_ids = tokenizer.apply_chat_template(
            messages, return_tensors="pt", add_generation_prompt=True
).to(device)

out = model.generate(
    input_ids=input_ids,
    max_length=2000,
    temperature=0.9,
    top_p=0.7,
    eos_token_id=tokenizer.eos_token_id,
)

decoded = tokenizer.batch_decode(out)
assistant_message = (
    decoded[0].split("<|assistant|>\n")[-1].replace(eos_token, "")
)

print(assistant_message)

Gradio Demo

git clone https://github.com/mrm8488/mamba-chat.git
cd mamba-chat

pip install -r requirements.txt
pip install -q gradio==4.8.0

python app.py \
--model clibrain/mamba-2.8b-instruct-openhermes \
--share

✨ Features

Base Model Features

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

Dataset Features

The OpenHermes dataset is composed of 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:

GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan
Airoboros GPT-4 (v1.0), by JonDurbin
Camel-AI's domain expert datasets, by the Camel-AI Team
CodeAlpaca, by Sahil2801
GPT4-LLM and Unnatural Instructions, by Microsoft

Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more. The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.

📚 Documentation

Model Card

Model Card is still WIP!

Evaluations

Evaluations are coming soon!

📄 License

This project is licensed under the WTFPL license.

Acknowledgments

Thanks to mamba-chat for heavily inspiring our work.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご