h2o-danube3-4b-chat开源聊天模型 - 免费支持文本生成，手机离线也能用

首页

H2o Danube3 4b Chat

由 h2oai 开发

H2O.ai 微调的40亿参数聊天模型，支持文本生成任务，可在手机上离线运行。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #手机离线运行 #长上下文支持 #40亿参数

下载量 41.49k

发布时间 : 7/4/2024

模型简介

h2o-danube3-4b-chat 是 H2O.ai 微调的聊天模型，拥有40亿参数，能有效处理文本生成任务，在多类基准测试中表现良好，且可在手机上离线运行。

模型特点

双版本发布

提供基础模型和聊天模型两个版本，满足不同场景需求。

低资源运行

可以在手机上原生且完全离线运行，方便用户随时随地使用。

长上下文支持

使用Mistral分词器，词汇量为32000，模型训练的上下文长度可达8192。

模型能力

文本生成

聊天对话

多轮对话处理

使用案例

聊天机器人

健康咨询

回答关于健康生活的问题，如'为什么喝水有益健康？'

生成详细且准确的健康建议

客户服务

自动客服

处理客户常见问题，提供即时响应

提升客户满意度，减少人工客服负担

🚀 h2o-danube3-4b-chat 模型介绍

h2o-danube3-4b-chat 是 H2O.ai 微调的聊天模型，拥有 40 亿参数。该模型能有效处理文本生成任务，在多类基准测试中表现良好，且可在手机上离线运行。

🚀 快速开始

你可以使用 transformers 库在 GPU 机器上使用该模型。首先确保你已经安装了 transformers 库：

pip install transformers>=4.42.3

以下是使用示例代码：

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="h2oai/h2o-danube3-4b-chat",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# We use the HF Tokenizer chat template to format each message
# https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {"role": "user", "content": "Why is drinking water so healthy?"},
]
prompt = pipe.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
res = pipe(
    prompt,
    return_full_text=False,
    max_new_tokens=256,
)
print(res[0]["generated_text"])

此代码会自动应用并运行正确的提示格式：

<|prompt|>Why is drinking water so healthy?</s><|answer|>

你也可以通过以下方式运行：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "h2oai/h2o-danube3-4b-chat"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Why is drinking water so healthy?"},
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(
    prompt, return_tensors="pt", add_special_tokens=False
).to("cuda")

# generate configuration can be modified to your needs
tokens = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    min_new_tokens=2,
    max_new_tokens=256,
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)

✨ 主要特性

双版本发布：H2O.ai 发布了该模型的两个版本，分别为基础模型和聊天模型，满足不同场景需求。
低资源运行：可以在手机上原生且完全离线运行，方便用户随时随地使用。
长上下文支持：使用 Mistral 分词器，词汇量为 32000，模型训练的上下文长度可达 8192。

📦 安装指南

使用前需安装 transformers 库，安装命令如下：

pip install transformers>=4.42.3

🔧 技术细节

模型架构调整

我们对 Llama 2 架构进行了调整，总参数约 40 亿。具体细节请参考我们的技术报告。

架构详情

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 3840, padding_idx=0)
    (layers): ModuleList(
      (0-23): 24 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=3840, out_features=3840, bias=False)
          (k_proj): Linear(in_features=3840, out_features=960, bias=False)
          (v_proj): Linear(in_features=3840, out_features=960, bias=False)
          (o_proj): Linear(in_features=3840, out_features=3840, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=3840, out_features=10240, bias=False)
          (up_proj): Linear(in_features=3840, out_features=10240, bias=False)
          (down_proj): Linear(in_features=10240, out_features=3840, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=3840, out_features=32000, bias=False)
)

量化与分片

你可以通过指定 load_in_8bit=True 或 load_in_4bit=True 来使用量化方式加载模型。同时，通过设置 device_map=auto 可以在多个 GPU 上进行分片。

📚 详细文档

模型版本

模型名称	描述
h2oai/h2o-danube3-4b-base	基础模型
h2oai/h2o-danube3-4b-chat	聊天模型

基准测试结果

Open LLM Leaderboard v1

基准测试	准确率
平均	61.42
ARC-challenge	58.96
Hellaswag	80.36
MMLU	54.74
TruthfulQA	47.79
Winogrande	76.48
GSM8K	50.18

MT-Bench

First Turn: 7.28
Second Turn: 5.69
Average: 6.49

📄 许可证

本项目采用 Apache-2.0 许可证。

⚠️ 免责声明

在使用本仓库提供的大语言模型之前，请仔细阅读本免责声明。使用该模型即表示你同意以下条款和条件。

偏差与冒犯性：大语言模型在各种互联网文本数据上进行训练，这些数据可能包含有偏差、种族主义、冒犯性或其他不适当的内容。使用此模型即表示你承认并接受生成的内容有时可能会表现出偏差或产生冒犯性或不适当的内容。本仓库的开发者不认可、支持或推广任何此类内容或观点。
局限性：大语言模型是基于人工智能的工具，而非人类。它可能会产生不正确、无意义或不相关的回复。用户有责任批判性地评估生成的内容并自行决定是否使用。
风险自担：使用此大语言模型的用户必须对使用该工具可能产生的任何后果承担全部责任。本仓库的开发者和贡献者对因使用或滥用所提供的模型而导致的任何损害、损失或伤害不承担责任。
道德考量：鼓励用户负责任且合乎道德地使用大语言模型。使用此模型即表示你同意不将其用于促进仇恨言论、歧视、骚扰或任何形式的非法或有害活动的目的。
问题报告：如果你遇到大语言模型生成的任何有偏差、冒犯性或其他不适当的内容，请通过提供的渠道向仓库维护者报告。你的反馈将有助于改进模型并减轻潜在问题。
免责声明变更：本仓库的开发者保留随时修改或更新本免责声明的权利，无需事先通知。用户有责任定期查看免责声明，以了解任何变更。

使用本仓库提供的大语言模型即表示你同意接受并遵守本免责声明中规定的条款和条件。如果你不同意本免责声明的任何部分，则应避免使用该模型及其生成的任何内容。