h2o-danube3.1-4b-chat開源聊天模型 - 長上下文對話支持，免費暢聊體驗

首頁

H2o Danube3.1 4b Chat

由h2oai開發

由H2O.ai微調的40億參數聊天模型，基於Llama 2架構調整，支持8192上下文長度

大型語言模型

Transformers

英語開源協議:Apache-2.0 #手機離線運行 #長上下文支持 #多輪對話優化

下載量 271

發布時間 : 11/29/2024

模型概述

這是一個40億參數的大語言模型，專為聊天對話優化，可在手機上原生且完全離線運行。

模型特點

移動端支持

可在手機上原生且完全離線運行

長上下文支持

支持8192 tokens的上下文長度

高效推理

支持4-bit和8-bit量化，可在消費級硬件上運行

模型能力

文本生成

對話系統

問答系統

使用案例

聊天應用

個人助手

作為個人聊天助手提供日常問答服務

客服機器人

用於客戶服務場景的自動問答

教育

學習輔導

幫助學生解答學習問題

🚀 h2o-danube3.1-4b-chat模型

h2o-danube3.1-4b-chat是由H2O.ai微調的聊天模型，擁有40億參數。該模型使用H2O LLM Studio進行訓練，可在手機上原生且完全離線運行。

🚀 快速開始

你可以使用H2O AI Personal GPT親自嘗試在手機上離線運行該模型。

✨ 主要特性

參數規模：約40億參數，基於調整後的Llama 2架構。
離線運行：可在手機上原生且完全離線運行。
長上下文：使用Mistral分詞器，詞彙量32000，上下文長度可達8192。

📦 安裝指南

若要在具有GPU的機器上使用transformers庫運行該模型，首先確保你已安裝transformers庫：

pip install transformers==4.45.0

💻 使用示例

基礎用法

import torch
from transformers import pipeline

generate_text = pipeline(
    model="h2oai/h2o-danube3.1-4b-chat",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

# We use the HF Tokenizer chat template to format each message
# https://huggingface.co/docs/transformers/main/en/chat_templating

messages = [
    {
        "role": "system",
        "content": "You are a friendly and polite chatbot.",
    },
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

res = generate_text(
    messages,
    renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])

高級用法

你可以打印應用聊天模板後的示例提示，查看如何將其輸入到分詞器：

print(generate_text.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
))

你也可以自己從加載的模型和分詞器構建管道，並考慮預處理步驟：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "h2oai/h2o-danube3.1-4b-chat"

messages = [
    {
        "role": "system",
        "content": "You are a friendly and polite chatbot.",
    },
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model.cuda().eval()

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to("cuda")

tokens = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    renormalize_logits=True
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)

🔧 技術細節

模型架構

我們調整了Llama 2架構，總參數約40億。具體細節請參考我們的技術報告。模型架構的詳細信息如下：

超參數	值
n_layers	24
n_heads	32
n_query_groups	8
n_embd	3840
詞彙量	32000
序列長度	8192

模型架構的具體代碼如下：

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 3840, padding_idx=0)
    (layers): ModuleList(
      (0-23): 24 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=3840, out_features=3840, bias=False)
          (k_proj): Linear(in_features=3840, out_features=960, bias=False)
          (v_proj): Linear(in_features=3840, out_features=960, bias=False)
          (o_proj): Linear(in_features=3840, out_features=3840, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=3840, out_features=10240, bias=False)
          (up_proj): Linear(in_features=3840, out_features=10240, bias=False)
          (down_proj): Linear(in_features=10240, out_features=3840, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((3840,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((3840,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((3840,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=3840, out_features=32000, bias=False)
)

量化和分片

你可以通過指定load_in_8bit=True或load_in_4bit=True使用量化加載模型。此外，通過設置device_map=auto可以在多個GPU上進行分片。

📚 詳細文檔

基準測試

🤗 Open LLM Leaderboard v2

基準測試	acc_n
平均	16.21
IFEval	50.21
BBH	10.94
MATH Lvl 5	2.11
GPQA	4.7
MUSR	10.2
MML - PRO	19.1

🤗 Open LLM Leaderboard v1

基準測試	acc_n
平均	61.05
ARC - challenge	57.68
Hellaswag	79.05
MMLU	55.61
TruthfulQA	46.84
Winogrande	75.93
GSM8K	51.18

MT - Bench

First Turn: 7.394
Second Turn: 5.563
Average: 6.478

📄 許可證

本項目採用Apache 2.0許可證。

⚠️ 免責聲明

在使用本倉庫提供的大語言模型之前，請仔細閱讀本免責聲明。你使用該模型即表示你同意以下條款和條件。

偏差和冒犯性：大語言模型在各種互聯網文本數據上進行訓練，這些數據可能包含有偏差、種族主義、冒犯性或其他不適當的內容。使用此模型即表示你承認並接受生成的內容有時可能會表現出偏差，或產生冒犯性或不適當的內容。本倉庫的開發者不認可、支持或推廣任何此類內容或觀點。
侷限性：大語言模型是基於人工智能的工具，而非人類。它可能會產生不正確、無意義或不相關的回覆。用戶有責任批判性地評估生成的內容，並自行決定是否使用。
風險自擔：使用此大語言模型的用戶必須對使用該工具可能產生的任何後果承擔全部責任。本倉庫的開發者和貢獻者不對因使用或濫用該模型而導致的任何損害、損失或傷害承擔責任。
道德考量：鼓勵用戶負責任且合乎道德地使用大語言模型。使用此模型即表示你同意不將其用於宣揚仇恨言論、歧視、騷擾或任何形式的非法或有害活動。
問題報告：如果你遇到大語言模型生成的有偏差、冒犯性或其他不適當的內容，請通過提供的渠道向倉庫維護者報告。你的反饋將有助於改進模型並減輕潛在問題。
免責聲明變更：本倉庫的開發者保留隨時修改或更新本免責聲明的權利，無需事先通知。用戶有責任定期查看免責聲明，以瞭解任何變更。

使用本倉庫提供的大語言模型即表示你同意接受並遵守本免責聲明中規定的條款和條件。如果你不同意本免責聲明的任何部分，應避免使用該模型及其生成的任何內容。