Mistral-SUPRA開源模型 - 集Transformer和循環模型功能於一身的實用工具

首頁

Mistral Supra

由TRI-ML開發

Mistral-SUPRA是基於Mistral-7B初始化的線性RNN模型，兼具Transformer和循環模型的功能。

大型語言模型

PyTorch

英語開源協議:Apache-2.0 #線性RNN轉換 #雙模式推理 #高效訓練

下載量 163

發布時間 : 4/9/2024

模型概述

該模型通過特定訓練過程將Mistral-7B轉化為線性RNN，支持在推理時選擇並行或循環模式，適用於文本生成任務。

模型特點

線性RNN架構

將Mistral-7B轉化為線性RNN，兼具Transformer和循環模型的功能

雙模式推理

支持並行和循環兩種推理模式，可根據需求選擇

高效訓練

在100B標記的數據集上僅需1.5天完成訓練

模型能力

文本生成

語言理解

使用案例

自然語言處理

文本補全

根據給定文本片段生成連貫的後續內容

示例輸出：'Machine learning is a branch of artificial intelligence (AI) that enables computers to learn from experience...'

🚀 Mistral-SUPRA

Mistral-SUPRA 模型基於 Mistral-7B 初始化，經過進一步訓練轉化為線性 RNN。它是論文《Linearizing Large Language Models》的配套模型，能在推理時兼具 Transformer 和循環模型的功能。

🚀 快速開始

要使用 Mistral-SUPRA 模型，需先安裝支持線性注意力的 OpenLM 分支：

pip install git+https://github.com/tri-ml/linear_open_lm.git

導入 OpenLM 類：

from open_lm.open_lm_hf import *

使用 AutoTokenizer 和 AutoModelForCausalLM 加載模型：

from open_lm.open_lm_hf import *
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tri-ml/mistral-supra")
model = AutoModelForCausalLM.from_pretrained("tri-ml/mistral-supra")

inputs = tokenizer(["Machine learning is"], return_tensors="pt")
gen_kwargs = {"max_new_tokens": 50, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1}
output = model.generate(inputs['input_ids'], **gen_kwargs)
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
print(output)
# Machine learning is a branch of artificial intelligence (AI) that enables computers to learn from experience without being explicitly programmed. Machine learning is used in a wide range of applications, including spam filtering, image recognition, speech recognition, and computer-based medical diagnosis

模型支持並行和循環兩種模式：

# 循環模式
output = model.to('cuda').generate(inputs['input_ids'].to('cuda'), use_cache=True, **gen_kwargs)

# 並行模式
output = model.to('cuda').generate(inputs['input_ids'].to('cuda'), use_cache=False, **gen_kwargs)

✨ 主要特性

基於 Mistral-7B 模型初始化，轉化為線性模型，兼具 Transformer 和循環模型的功能。
可在推理時根據 use_cache 參數選擇並行或循環模式。

📦 安裝指南

要使用該模型，需先安裝支持線性注意力的 OpenLM 分支：

pip install git+https://github.com/tri-ml/linear_open_lm.git

💻 使用示例

基礎用法

from open_lm.open_lm_hf import *
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tri-ml/mistral-supra")
model = AutoModelForCausalLM.from_pretrained("tri-ml/mistral-supra")

inputs = tokenizer(["Machine learning is"], return_tensors="pt")
gen_kwargs = {"max_new_tokens": 50, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1}
output = model.generate(inputs['input_ids'], **gen_kwargs)
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
print(output)
# Machine learning is a branch of artificial intelligence (AI) that enables computers to learn from experience without being explicitly programmed. Machine learning is used in a wide range of applications, including spam filtering, image recognition, speech recognition, and computer-based medical diagnosis

高級用法

# 循環模式
output = model.to('cuda').generate(inputs['input_ids'].to('cuda'), use_cache=True, **gen_kwargs)

# 並行模式
output = model.to('cuda').generate(inputs['input_ids'].to('cuda'), use_cache=False, **gen_kwargs)

📚 詳細文檔

模型詳情

開發者：Toyota Research Institute
模型類型：這是一個自迴歸語言模型，基於 Mistral-7B 初始化，並根據 SUPRA 架構訓練為線性模型。
數據集：基於 Mistral-7B 初始化，在 100B 個 RefinedWeb 標記上進行訓練。
分詞器：mistralai/Mistral-7B-v0.1
庫：OpenLM（使用支持線性注意力的分支）
許可證：該模型遵循 Apache License, Version 2.0 許可。

參數	隱藏層大小	層數	詞彙表大小	序列長度
7B	4096	32	32000	2048

訓練詳情

Mistral-SUPRA 在 128 個 H100 80GB GPU 上使用 AWS SageMaker 進行訓練。
對 100B 個標記的訓練在 1.5 天內完成。

超參數	值
精度	`bfloat16`
優化器	AdamW
學習率	3e-5
學習率冷卻結束值	1e-5
預熱步數	1000
批量大小	2M
QK 歸一化	False

性能評估

使用 Eleuther LM Eval Harness 倉庫進行評估。

以下是 Mistral-SUPRA 與其他類似規模模型的性能對比：

	HellaSwag	PIQA	Winogrande	ARC-E	ARC-C	MMLU (5-shot)
Llama2-7B	76.0	79.1	69.1	76.3	46.3	45.9
Gemma-7B	80.7	81.9	73.7	81.1	53.2	62.9
Mistral-7B	81.0	82.1	74.0	80.9	53.8	62.4
RWKV5-1.7T-7B	73.0	78.6	72.9	75.8	45.6	34.9
Mamba-7B	77.9	81.0	71.8	77.5	46.7	33.3
Mistral-SUPRA	77.1	80.4	70.3	75.9	45.8	34.2

🔧 技術細節

該模型基於 Mistral-7B 初始化，通過特定的訓練過程將其轉化為線性 RNN。其線性注意力代碼可在 https://github.com/TRI-ML/linear_open_lm/ 找到。在推理時，可根據 use_cache 參數選擇並行或循環模式，以滿足不同的應用需求。

📄 許可證

該模型遵循 Apache License, Version 2.0 許可。

📚 引用說明

如果使用該模型，請引用論文《Linearizing Large Language Models》：

@article{Mercat2024Linearizing,
  title={Linearizing Large Language Models},
  author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
  year={2024},
  journal={arXiv preprint arXiv:2405.06640},
}

OpenLM 引用

@misc{open_lm,
  author = {Gururangan, Suchin and Wortsman, Mitchell and Gadre, Samir Yitzhak and Dave, Achal and Kilian, Maciej and Shi, Weijia and Mercat, Jean and Smyrnis, Georgios and Ilharco, Gabriel and Jordan, Matt and Heckel, Reinhard and Dimakis, Alex and Farhadi, Ali and Shankar, Vaishaal and Schmidt, Ludwig},
  title = {{open_lm}:  a minimal but performative language modeling (LM) repository},
  year = {2023},
  note = {GitHub repository},
  url = {https://github.com/mlfoundations/open_lm/}
}