TinyLlama-1.1B-Chat-v1.0开源轻量级模型 - 适配资源受限场景免费部署

首页

Tinyllama 1.1B Chat V1.0

由 TinyLlama 开发

小羊驼是一个11亿参数的轻量级Llama模型，通过3万亿标记数据预训练，并经过对话微调和对齐优化，适合资源受限场景。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #轻量级对话模型 #Llama2兼容架构 #DPO对齐优化

下载量 1.4M

发布时间 : 12/30/2023

模型简介

基于Llama 2架构的轻量级聊天模型，经过UltraChat和UltraFeedback数据集微调，支持英语对话生成。

模型特点

轻量化设计

仅11亿参数，适合计算资源和内存受限的应用场景

高效训练

仅需16张A100-40G显卡在90天内完成3万亿标记数据训练

兼容性强

完全复现Llama 2架构与分词器，兼容基于Llama的开源项目

对话优化

采用Zephyr训练方案，通过UltraChat和UltraFeedback数据集进行微调和对齐

模型能力

文本生成

对话交互

编程辅助

使用案例

聊天机器人

风格化对话

可定制系统提示实现不同风格的对话响应（如海盗风格）

生成符合角色设定的自然语言回复

编程辅助

代码生成

根据自然语言描述生成Python等编程语言的代码片段

如生成斐波那契数列计算函数

🚀 TinyLlama-1.1B

TinyLlama项目旨在预训练一个在3万亿个标记上训练的11亿参数的Llama模型。通过适当的优化，使用16块A100 - 40G GPU，我们可以在“仅”90天内完成这一目标🚀🚀。训练已于2023年9月1日开始。

该模型采用了与Llama 2完全相同的架构和分词器。这意味着TinyLlama可以无缝集成到许多基于Llama构建的开源项目中。此外，TinyLlama仅拥有11亿参数，体积小巧。这种紧凑性使其能够满足许多对计算和内存占用有严格要求的应用场景。

🚀 快速开始

你需要 transformers >= 4.34 版本。更多信息请查看 TinyLlama 的GitHub页面。

# 从源码安装transformers - 仅适用于版本 <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# 我们使用分词器的聊天模板来格式化每条消息 - 请参阅 https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...

✨ 主要特性

架构兼容：采用与Llama 2相同的架构和分词器，可轻松集成到基于Llama的开源项目中。
参数紧凑：仅11亿参数，适合对计算和内存要求较高的应用。

📦 安装指南

你需要 transformers >= 4.34 版本，可按需从源码安装：

# 从源码安装transformers - 仅适用于版本 <= v4.34
pip install git+https://github.com/huggingface/transformers.git
pip install accelerate

💻 使用示例

基础用法

# 从源码安装transformers - 仅适用于版本 <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# 我们使用分词器的聊天模板来格式化每条消息 - 请参阅 https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...

📚 详细文档

本模型

此聊天模型是在 TinyLlama/TinyLlama - 1.1B - intermediate - step - 1431k - 3T 基础上进行微调的。我们遵循 HF的Zephyr 的训练方法。该模型最初在 UltraChat 数据集的一个变体上进行微调，该数据集包含ChatGPT生成的各种合成对话。然后，我们使用 🤗 TRL的 DPOTrainer 在 openbmb/UltraFeedback 数据集上进一步对齐模型，该数据集包含64k个由GPT - 4排名的提示和模型完成内容。