mamba-7b-rw開源自然語言處理模型 - 多輪訓練助力語言任務處理

首頁

Mamba 7b Rw

由TRI-ML開發

Mamba-7B 是一個基於 Mamba 架構的 70 億參數模型，在 RefinedWeb 數據集上進行了多輪訓練（1.2 萬億標記）。Mamba 是一種狀態空間模型，不使用自注意力機制，在多種自然語言基準測試中表現出色。

大型語言模型

Safetensors

英語開源協議:Apache-2.0 #狀態空間模型 #高效文本生成 #無注意力機制

下載量 188

發布時間 : 4/8/2024

模型概述

Mamba-7B 是一個自迴歸語言模型，基於 Mamba 架構，專為文本生成任務設計。它在 1.2 萬億標記的 RefinedWeb 數據集上訓練，支持英語語言。

模型特點

基於 Mamba 架構

Mamba 是一種狀態空間模型，不使用自注意力機制，具有線性時間複雜度和高效推理能力。

大規模訓練數據

在 1.2 萬億標記的 RefinedWeb 數據集上訓練，覆蓋廣泛的自然語言任務。

高效推理

由於 Mamba 架構的特性，模型在推理時具有較高的效率和較低的計算成本。

模型能力

文本生成

自然語言理解

問答系統

使用案例

自然語言處理

文本生成

生成連貫且上下文相關的文本，適用於內容創作、對話系統等。

生成的文本具有較高的連貫性和相關性。

問答系統

回答用戶提出的問題，適用於客服、教育等領域。

在 MMLU 數據集上準確率為 33.3。

🚀 Mamba-7B

Mamba-7B是一個具有70億參數的模型，採用Mamba架構，在RefinedWeb數據集上進行了多輪訓練（處理了1.2萬億個標記）。Mamba是一種狀態空間模型，與標準的Transformer架構不同，它不使用自注意力機制。該模型在各種自然語言基準測試中表現出色。截至目前，公開發布的最大純Mamba預訓練模型是Mamba-2.8B。本項目遵循其訓練方案，發佈了Mamba-7B的版本。此模型作為論文Linearizing Large Language Models的基線模型進行訓練。

🚀 快速開始

本模型使用OpenLM進行訓練，權重已轉換為與HuggingFace兼容的格式。以下是使用示例：

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tri-ml/mamba-7b-rw")
model = AutoModelForCausalLM.from_pretrained("tri-ml/mamba-7b-rw")

inputs = tokenizer(["The Toyota Supra"], return_tensors="pt")
gen_kwargs = {"max_new_tokens": 50, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1}
output = model.generate(inputs['input_ids'], **gen_kwargs)
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
print(output)
# The Toyota Supra is a sports car that has been in production since 1978. The car was discontinued in 2002, but it has recently been revived and will be available again in 2020. The Supra has always been known for its powerful engines and agile handling.

✨ 主要特性

採用Mamba架構，不使用自注意力機制，在自然語言基準測試中表現出色。
作為論文Linearizing Large Language Models的基線模型進行訓練。

📦 安裝指南

文檔未提及具體安裝步驟，可參考OpenLM的相關說明。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tri-ml/mamba-7b-rw")
model = AutoModelForCausalLM.from_pretrained("tri-ml/mamba-7b-rw")

inputs = tokenizer(["The Toyota Supra"], return_tensors="pt")
gen_kwargs = {"max_new_tokens": 50, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1}
output = model.generate(inputs['input_ids'], **gen_kwargs)
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
print(output)
# The Toyota Supra is a sports car that has been in production since 1978. The car was discontinued in 2002, but it has recently been revived and will be available again in 2020. The Supra has always been known for its powerful engines and agile handling.

📚 詳細文檔

模型詳情

屬性	詳情
開發方	Toyota Research Institute
模型類型	基於Mamba架構的自迴歸語言模型
訓練數據	在RefinedWeb數據集的1.2萬億個標記上進行訓練
分詞器	`EleutherAI/gpt-neox-20b`
庫	OpenLM
許可證	Apache License, Version 2.0

參數	隱藏層大小	層數	詞表大小	序列長度
70億	4096	64	50432	2048

訓練詳情

Mamba-7B在128個H100 80GB GPU上使用AWS SageMaker進行訓練。
訓練於2024年3月開始，持續了三週。 | 超參數 | 值 | |------------|---------| | 精度 | bfloat16 | | 優化器 | AdamW | | 學習率 | 3e-4 | | 學習率冷卻結束值 | 1e-5 | | 預熱步數 | 2000 | | Z損失 | 1e-4 | | 批量大小 | 200萬 |

性能評估

評估使用Eleuther LM Eval Harness倉庫進行。以下是Mamba 7B與其他基礎模型的性能對比：

	HellaSwag	PIQA	Winogrande	ARC-E	ARC-C	MMLU (5-shot)
Mamba-1.4B	59.0	73.9	61.4	65.5	32.9	25.2
Mamba-2.8B	71.0	78.1	65.9	68.2	41.7	26.2
RWKV5-1.7T-7B	73.0	78.6	72.9	75.8	45.6	34.9
Llama2-7B	76.0	79.1	69.1	76.3	46.3	45.9
Gemma-7B	80.7	81.9	73.7	81.1	53.2	62.9
Mistral-7B	81.0	82.1	74.0	80.9	53.8	62.4
Mamba-7B	77.9	81.0	71.8	77.5	46.7	33.3

🔧 技術細節

Mamba是一種狀態空間模型，不使用自注意力機制，在自然語言處理任務中表現良好。
模型訓練使用了AWS SageMaker和特定的超參數設置，以確保模型的性能和穩定性。

📄 許可證

本模型遵循Apache License, Version 2.0許可協議。

如何引用

如果使用此模型，請引用論文Linearizing Large Language Models：

@article{Mercat2024Linearizing,
  title={Linearizing Large Language Models},
  author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
  journal={arXiv preprint arXiv:2405.06640},
  year={2024}
}

引用文獻

Mamba

@article{mamba,
  title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}

OpenLM

@misc{open_lm,
  author = {Gururangan, Suchin and Wortsman, Mitchell and Gadre, Samir Yitzhak and Dave, Achal and Kilian, Maciej and Shi, Weijia and Mercat, Jean and Smyrnis, Georgios and Ilharco, Gabriel and Jordan, Matt and Heckel, Reinhard and Dimakis, Alex and Farhadi, Ali and Shankar, Vaishaal and Schmidt, Ludwig},
  title = {{open_lm}:  a minimal but performative language modeling (LM) repository},
  year = {2023},
  note = {GitHub repository},
  url = {https://github.com/mlfoundations/open_lm/}
}