Chinese-Llama-2-7b开源模型 - 免费商用支持中文对话，兼容原版优化方案

首页

Chinese Llama 2 7b

由 LinkSoul 开发

完全开源且可商用的中文版Llama2模型及中英文监督微调数据集，输入格式严格遵循llama-2-chat标准，全面兼容针对原版llama-2-chat模型的所有优化方案。

大型语言模型

Transformers

支持多种语言开源协议:Openrail #中英双语对话 #指令微调优化 #开源商用授权

下载量 534

发布时间 : 7/20/2023

模型简介

这是一个基于Llama 2架构的中英文双语大语言模型，经过1000万条中英文监督微调数据集训练，适用于对话生成和指令跟随任务。

模型特点

中英双语支持

同时支持中文和英文的文本生成与对话任务

完全开源商用

采用Apache-2.0许可证，允许商业用途

兼容Llama-2-chat标准

输入格式严格遵循llama-2-chat标准，兼容相关优化方案

量化版本可用

提供4位量化版本，降低部署资源需求

模型能力

文本生成

对话系统

指令跟随

中英翻译

知识问答

使用案例

对话系统

智能助手

构建中文智能助手，回答用户问题

能提供有帮助、尊重且安全的回答

内容生成

文本创作

生成各种类型的中英文文本内容

🚀 中文版Llama2 7B模型

这是一个完全开源且可商用的中文版Llama2模型及中英文SFT数据集。其输入格式严格遵循 llama - 2 - chat 格式，能兼容适配所有针对原版 llama - 2 - chat 模型的优化。

Chinese LLaMA2 7B

🚀 快速开始

基础演示

Base Demo

在线试玩

空谈无益，展示演示。

Demo 地址 / HuggingFace Spaces
Colab 一键启动 // 正在准备

资源下载

模型下载：Chinese Llama2 Chat Model
4bit量化：Chinese Llama2 4bit Chat Model

我们使用了中英文SFT数据集，数据量达1000万。

数据集：https://huggingface.co/datasets/LinkSoul/instruction_merge_set
训练及推理代码：https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

快速测试

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("用英文回答，什么是夫妻肺片？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

✨ 主要特性

完全开源且可商用，为开发者提供了极大的便利。
输入格式严格遵循 llama - 2 - chat 格式，能很好地兼容适配针对原版 llama - 2 - chat 模型的优化。

📦 安装指南

文档未提及具体安装步骤，可参考训练及推理代码中的说明：https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("用英文回答，什么是夫妻肺片？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

📚 详细文档

可参考以下资源获取更多详细信息：

模型相关：Chinese Llama2 Chat Model
数据集：https://huggingface.co/datasets/LinkSoul/instruction_merge_set
训练及推理代码：https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

📄 许可证

本项目遵循 Apache - 2.0 license

微信交流群

欢迎加入微信群

信息表格

属性	详情
模型类型	中文版Llama2 7B模型
训练数据	中英文SFT数据集，数据量1000万，数据集地址：https://huggingface.co/datasets/LinkSoul/instruction_merge_set