Chinese-Llama-2-7b開源模型 - 免費商用支持中文對話，兼容原版優化方案

首頁

Chinese Llama 2 7b

由LinkSoul開發

完全開源且可商用的中文版Llama2模型及中英文監督微調數據集，輸入格式嚴格遵循llama-2-chat標準，全面兼容針對原版llama-2-chat模型的所有優化方案。

大型語言模型

Transformers

支持多種語言開源協議:Openrail #中英雙語對話 #指令微調優化 #開源商用授權

下載量 534

發布時間 : 7/20/2023

模型概述

這是一個基於Llama 2架構的中英文雙語大語言模型，經過1000萬條中英文監督微調數據集訓練，適用於對話生成和指令跟隨任務。

模型特點

中英雙語支持

同時支持中文和英文的文本生成與對話任務

完全開源商用

採用Apache-2.0許可證，允許商業用途

兼容Llama-2-chat標準

輸入格式嚴格遵循llama-2-chat標準，兼容相關優化方案

量化版本可用

提供4位量化版本，降低部署資源需求

模型能力

文本生成

對話系統

指令跟隨

中英翻譯

知識問答

使用案例

對話系統

智能助手

構建中文智能助手，回答用戶問題

能提供有幫助、尊重且安全的回答

內容生成

文本創作

生成各種類型的中英文文本內容

🚀 中文版Llama2 7B模型

這是一個完全開源且可商用的中文版Llama2模型及中英文SFT數據集。其輸入格式嚴格遵循 llama - 2 - chat 格式，能兼容適配所有針對原版 llama - 2 - chat 模型的優化。

Chinese LLaMA2 7B

🚀 快速開始

基礎演示

Base Demo

在線試玩

空談無益，展示演示。

Demo 地址 / HuggingFace Spaces
Colab 一鍵啟動 // 正在準備

資源下載

模型下載：Chinese Llama2 Chat Model
4bit量化：Chinese Llama2 4bit Chat Model

我們使用了中英文SFT數據集，數據量達1000萬。

數據集：https://huggingface.co/datasets/LinkSoul/instruction_merge_set
訓練及推理代碼：https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

快速測試

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("用英文回答，什麼是夫妻肺片？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

✨ 主要特性

完全開源且可商用，為開發者提供了極大的便利。
輸入格式嚴格遵循 llama - 2 - chat 格式，能很好地兼容適配針對原版 llama - 2 - chat 模型的優化。

📦 安裝指南

文檔未提及具體安裝步驟，可參考訓練及推理代碼中的說明：https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("用英文回答，什麼是夫妻肺片？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

📚 詳細文檔

可參考以下資源獲取更多詳細信息：

模型相關：Chinese Llama2 Chat Model
數據集：https://huggingface.co/datasets/LinkSoul/instruction_merge_set
訓練及推理代碼：https://github.com/LinkSoul-AI/Chinese-Llama-2-7b

📄 許可證

本項目遵循 Apache - 2.0 license

微信交流群

歡迎加入微信群

信息表格

屬性	詳情
模型類型	中文版Llama2 7B模型
訓練數據	中英文SFT數據集，數據量1000萬，數據集地址：https://huggingface.co/datasets/LinkSoul/instruction_merge_set