mamba-gpt-3b開源大語言模型 - 媲美llama-7b，實現高性能對話交互

首頁

Mamba Gpt 3b

由CobraMamba開發

基於open-lama微調的3B參數大語言模型，性能超越原版並媲美llama-7b

大型語言模型

Transformers

英語開源協議:Apache-2.0 #高效微調3B模型 #媲美7B性能 #多任務評估領先

下載量 653

發布時間 : 6/12/2023

模型概述

對open-lama模型進行微調優化的3B參數大語言模型，在多項評測任務中表現優異，支持文本生成等自然語言處理任務

模型特點

高效微調

通過對open-lama模型精細微調，在多個評估子任務中超越原始模型表現

小體積高性能

僅3B參數規模卻達到與llama-7b相當的模型性能

優化推理配置

提供溫度調節、重複懲罰等精細化生成參數控制

模型能力

文本生成

問答系統

知識推理

使用案例

智能問答

健康知識問答

回答關於健康生活的常識性問題

如示例中關於飲水健康的回答

內容創作

短文生成

根據提示生成連貫的文本內容

🚀 mamba - gpt - 3b模型

本項目對open - lama模型進行了微調，在多個評估子任務中超越了原始模型，使其成為目前性能最佳的3B模型，性能可與llama - 7b相媲美。

🚀 快速開始

要在配備GPU的機器上使用transformers庫調用此模型，首先要確保已安裝transformers、accelerate和torch庫。

pip install transformers==4.29.2
pip install accelerate==0.19.0
pip install torch==2.0.0

import torch
from transformers import pipeline

generate_text = pipeline(
    model="CobraMamba/mamba-gpt-3b",
    torch_dtype="auto",
    trust_remote_code=True,
    use_fast=False,
    device_map={"": "cuda:0"},
)

res = generate_text(
    "Why is drinking water so healthy?",
    min_new_tokens=2,
    max_new_tokens=1024,
    do_sample=False,
    num_beams=1,
    temperature=float(0.3),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)
print(res[0]["generated_text"])

你可以在預處理步驟後打印一個示例提示，以查看它是如何被輸入到分詞器中的：

print(generate_text.preprocess("Why is drinking water so healthy?")["prompt_text"])

<|prompt|>Why is drinking water so healthy?</s><|answer|>

或者，你可以下載mamba_gpt_pipeline.py，將其與你的筆記本放在一起，並根據加載的模型和分詞器自己構建管道。如果模型和分詞器在transformers包中得到完全支持，這將允許你設置trust_remote_code = False。

import torch
from mamba_gpt_pipeline import MambaGPTTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "CobraMamba/mamba-gpt-3b",
    use_fast=False,
    padding_side="left",
    trust_remote_code=False,
)
model = AutoModelForCausalLM.from_pretrained(
    "CobraMamba/mamba-gpt-3b",
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=False,
)
generate_text = MambaGPTTextGenerationPipeline(model=model, tokenizer=tokenizer)

res = generate_text(
    "Why is drinking water so healthy?",
    min_new_tokens=2,
    max_new_tokens=1024,
    do_sample=False,
    num_beams=1,
    temperature=float(0.3),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)
print(res[0]["generated_text"])

你也可以自己根據加載的模型和分詞器構建管道，並考慮預處理步驟：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "CobraMamba/mamba-gpt-3b"  # either local folder or huggingface model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
prompt = "<|prompt|>How are you?</s><|answer|>"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    use_fast=False,
    trust_remote_code=False,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=False,
)
model.cuda().eval()
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

# generate configuration can be modified to your needs
tokens = model.generate(
    **inputs,
    min_new_tokens=2,
    max_new_tokens=1024,
    do_sample=False,
    num_beams=1,
    temperature=float(0.3),
    repetition_penalty=float(1.2),
    renormalize_logits=True
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)

✨ 主要特性

對open - lama模型進行微調，在多個評估子任務中超越原始模型。
目前性能最佳的3B模型，性能可與llama - 7b相媲美。

📦 安裝指南

要使用該模型，需安裝以下依賴庫：

pip install transformers==4.29.2
pip install accelerate==0.19.0
pip install torch==2.0.0

📚 詳細文檔

模型指標

指標	值
MMLU (5 - shot)	25.3
ARC (25 - shot)	40.5
HellaSwag (10 - shot)	64.9
TruthfulQA (0 - shot)	37.1
平均值	42.0

我們使用最先進的Language Model Evaluation Harness來運行上述基準測試。

模型架構

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

評估結果

我們使用lm - evaluation - harness在廣泛的任務上對OpenLLaMA進行了評估。LLaMA的結果是通過在相同的評估指標上運行原始LLaMA模型生成的。我們注意到，我們的LLaMA模型結果與原始LLaMA論文略有不同，我們認為這是由於不同的評估協議造成的。類似的差異已在lm - evaluation - harness的這個問題中報告。此外，我們還展示了GPT - J的結果，它是由EleutherAI在Pile數據集上訓練的一個6B參數模型。

原始LLaMA模型訓練了1萬億個標記，GPT - J訓練了5000億個標記。我們在下表中展示結果。OpenLLaMA在大多數任務上表現出與原始LLaMA和GPT - J相當的性能，並且在某些任務上表現更優。

任務/指標	微調後的GPT 3B	OpenLLaMA 3B
anli_r1/acc	0.35	0.33
anli_r2/acc	0.33	0.32
anli_r3/acc	0.35	0.35
arc_challenge/acc	0.35	0.34
arc_challenge/acc_norm	0.37	0.37
arc_easy/acc	0.71	0.69
arc_easy/acc_norm	0.65	0.65
boolq/acc	0.72	0.66
hellaswag/acc	0.49	0.43
hellaswag/acc_norm	0.66	0.67
openbookqa/acc	0.26	0.27
openbookqa/acc_norm	0.40	0.40
piqa/acc	0.76	0.75
piqa/acc_norm	0.76	0.76
record/em	0.88	0.88
record/f1	0.88	0.89
rte/acc	0.55	0.58
truthfulqa_mc/mc1	0.27	0.22
truthfulqa_mc/mc2	0.37	0.35
wic/acc	0.49	0.48
winogrande/acc	0.63	0.62
平均值	0.53	0.52

我們從基準測試中移除了任務CB和WSC，因為我們的模型在這兩個任務上的表現異常出色。我們推測訓練集中可能存在基準數據汙染。

免責聲明

在使用本倉庫提供的大語言模型之前，請仔細閱讀本免責聲明。使用該模型即表示您同意以下條款和條件。

偏差與冒犯性：大語言模型是在各種互聯網文本數據上訓練的，這些數據可能包含有偏差、種族主義、冒犯性或其他不適當的內容。使用此模型即表示您承認並接受生成的內容有時可能會表現出偏差或產生冒犯性或不適當的內容。本倉庫的開發者不認可、支持或推廣任何此類內容或觀點。
侷限性：大語言模型是基於人工智能的工具，而非人類。它可能會產生不正確、無意義或不相關的回覆。用戶有責任批判性地評估生成的內容，並自行決定是否使用。
風險自擔：使用此大語言模型的用戶必須對使用該工具可能產生的任何後果承擔全部責任。本倉庫的開發者和貢獻者不對因使用或濫用所提供的模型而導致的任何損害、損失或傷害承擔責任。
倫理考量：鼓勵用戶負責任且合乎道德地使用大語言模型。使用此模型即表示您同意不將其用於宣揚仇恨言論、歧視、騷擾或任何形式的非法或有害活動的目的。
問題反饋：如果您遇到大語言模型生成的有偏差、冒犯性或其他不適當的內容，請通過提供的渠道向倉庫維護者報告。您的反饋將有助於改進模型並減輕潛在問題。
免責聲明變更：本倉庫的開發者保留在任何時候修改或更新本免責聲明的權利，且無需事先通知。用戶有責任定期查看免責聲明，以瞭解任何變更。

使用本倉庫提供的大語言模型即表示您同意接受並遵守本免責聲明中規定的條款和條件。如果您不同意本免責聲明的任何部分，您應避免使用該模型及其生成的任何內容。