Jamba-v0.1-chat-multilingual开源聊天模型 - 支持多语言流畅对话交流

首页

Jamba V0.1 Chat Multilingual

由 lightblue 开发

基于ai21labs/Jamba-v0.1微调的聊天机器人模型，支持多语言对话，经过数小时QLoRA微调，能在英语及其他语言中进行合理流畅的对话。

大型语言模型

Transformers

开源协议:Apache-2.0 #256K超长上下文 #多语言对话 #QLoRA高效微调

下载量 22

发布时间 : 3/30/2024

模型简介

本模型是一个小规模实验训练项目，旨在探索如何微调Jamba模型以成为聊天机器人。初步测试表明，该模型在英语和其他语言中均能进行合理流畅的对话。

模型特点

多语言支持

模型能够以多种语言进行对话，包括英语、日语、波兰语等。

长上下文处理

支持长达256K的上下文长度，适合处理长对话和复杂任务。

快速微调

仅需数小时QLoRA微调即可获得良好的对话能力。

系统消息控制

通过系统消息可以轻松引导模型的行为，如调整回答风格、语言难度等。

模型能力

文本生成

多语言对话

系统消息引导

多轮对话

使用案例

聊天机器人

英语对话

模型可以进行流畅的英语对话，回答各种问题。

回答合理流畅，但在某些领域可能出现幻觉性错误信息。

多语言对话

模型能够以提示语言进行回复，支持多种语言。

测试显示模型能较可靠地以提示语言进行回复。

系统消息控制

通过系统消息可以控制模型的回答风格，如极简英语、复杂英语或押韵回答。

效果参差不齐，但基本能达到预期效果。

信息查询

事实查询

回答关于各种主题的事实性问题。

在某些领域表现出色，但在其他领域会出现错误信息。

🚀 Jamba-v0.1聊天多语言模型

本模型是一个基于transformers库微调的聊天模型，通过对ai21labs/Jamba-v0.1进行微调，使其能够作为聊天机器人使用。经过初步测试，该模型在英语和其他语言的对话中表现良好。

🚀 快速开始

本部分将介绍如何快速使用该模型进行文本生成。

安装依赖

Jamba要求使用transformers版本4.39.0或更高：

pip install transformers>=4.39.0

为了运行优化后的Mamba实现，首先需要安装mamba-ssm和causal-conv1d：

pip install mamba-ssm causal-conv1d>=1.2.0

同时，你还需要将模型部署在CUDA设备上。

代码示例

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

double_quant_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.float16
)

model = AutoModelForCausalLM.from_pretrained("lightblue/Jamba-v0.1-chat-multilingual", device_map="auto", quantization_config=double_quant_config, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("lightblue/Jamba-v0.1-chat-multilingual")

input_text = """<|im_start|>system 
You are a helpful AI assistant.
<|im_end|> 
<|im_start|>user
What is the most interesting fact about kangaroos that you know?
<|im_end|> 
<|im_start|>assistant
"""

input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=256, temperature=0.0, repetition_penalty=1.1)

print(tokenizer.batch_decode(outputs)[0])
# <|startoftext|><|im_start|>system 
# You are a helpful AI assistant.
# <|im_end|> 
# <|im_start|>user
# What is the most interesting fact about kangaroos that you know?
# <|im_end|> 
# <|im_start|>assistant
# One of the most interesting facts about kangaroos is their unique reproductive system, which involves embryonic diapause and multiple births. Female kangaroos can have up to three joeys at different stages of development simultaneously: one fully developed in the pouch, another developing inside her body, and an embryo waiting for its turn.<|im_end|>
# <|endoftext|>

注意事项

此代码会自动在任何输入后附加 "<|startoftext|>" 特殊标记。在推理时，所有输入都需要附加此标记，因为初步测试表明，不添加此标记会导致输出错误。
你可以在不使用优化的Mamba内核的情况下运行模型，但不建议这样做，因为这会导致显著的延迟增加。若要这样做，在加载模型时需要指定use_mamba_kernels=False。

✨ 主要特性

多语言支持：该模型经过训练，可以在英语和其他语言的对话中表现良好。
微调实验：通过对ai21labs/Jamba-v0.1进行微调，探索其作为聊天机器人的潜力。
可控输出：可以通过系统消息来引导模型的输出，例如控制输出的语言难度、是否押韵等。

📦 安装指南

依赖安装

Jamba要求使用transformers版本4.39.0或更高：

pip install transformers>=4.39.0

为了运行优化后的Mamba实现，首先需要安装mamba-ssm和causal-conv1d：

pip install mamba-ssm causal-conv1d>=1.2.0

设备要求

你需要将模型部署在CUDA设备上。

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

double_quant_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.float16
)

model = AutoModelForCausalLM.from_pretrained("lightblue/Jamba-v0.1-chat-multilingual", device_map="auto", quantization_config=double_quant_config, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("lightblue/Jamba-v0.1-chat-multilingual")

input_text = """<|im_start|>system 
You are a helpful AI assistant.
<|im_end|> 
<|im_start|>user
What is the most interesting fact about kangaroos that you know?
<|im_end|> 
<|im_start|>assistant
"""

input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=256, temperature=0.0, repetition_penalty=1.1)

print(tokenizer.batch_decode(outputs)[0])

高级用法

你可以通过系统消息来控制模型的输出，例如让模型用简单英语、复杂英语或押韵的方式回答问题。

# 系统消息控制输出为简单英语
input_text = """<|startoftext|><|im_start|>system 
You are a helpful AI assistant. You write all answers in very simple English.
<|im_end|> 
<|im_start|>user
Write a 50 word analysis of why sausages are better than bacon.
<|im_end|> 
<|im_start|>assistant
"""

input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=256, temperature=0.0, repetition_penalty=1.1)

print(tokenizer.batch_decode(outputs)[0])

# 系统消息控制输出为复杂英语
input_text = """<|startoftext|><|im_start|>system 
You are a helpful AI assistant. You write all answers in very complex English.
<|im_end|> 
<|im_start|>user
Write a 50 word analysis of why sausages are better than bacon.
<|im_end|> 
<|im_start|>assistant
"""

input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=256, temperature=0.0, repetition_penalty=1.1)

print(tokenizer.batch_decode(outputs)[0])

# 系统消息控制输出为押韵
input_text = """<|startoftext|><|im_start|>system 
You are an AI assistant that answers all questions in rhyme.
<|im_end|> 
<|im_start|>user
Why is the sky blue?
<|im_end|> 
<|im_start|>assistant
"""

input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=256, temperature=0.0, repetition_penalty=1.1)

print(tokenizer.batch_decode(outputs)[0])

📚 详细文档

模型详情

属性	详情
模型类型	Joint Attention and Mamba (Jamba)
许可证	Apache 2.0
上下文长度	256K
知识截止日期	2024年3月5日

初始测试结果

知识表现：模型在某些领域表现出了不错的知识水平，但在其他领域可能会产生错误信息。
系统消息控制：通过系统消息可以比较容易地引导模型的输出，例如控制输出的语言难度、是否押韵等。
多轮对话准确性：初步测试显示，模型在多轮对话中的准确性表现尚可。
多语言测试：模型能够比较可靠地用输入的语言进行回复。

训练详情

训练数据

jondurbin/airoboros-3.2：一个约59K示例的英语LLM任务数据集，主要由GPT-4生成。该数据集包含了各种类型的任务，因此被选作主要的训练数据。
openchat/openchat_sharegpt4_dataset (仅GPT-4回复)：一个约6K示例的多语言多轮对话数据集，包含了用户与GPT-4的对话。由于jondurbin/airoboros-3.2中几乎没有多语言数据，而该项目是一家日本AI公司，需要模型能够输出日语，因此选择了这个数据集来补充多语言数据。

数据准备代码

import os
import pandas as pd
from datasets import load_dataset, Dataset, concatenate_datasets

os.environ['HF_HOME'] = "/workspace/hf_home"
os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = "1"

boros_dataset = load_dataset("jondurbin/airoboros-3.2", split='train')

gpt4_df = pd.read_json("https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json?download=true")
gpt4_df["conversations"] = gpt4_df["items"].apply(lambda x: [{'from': 'system', 'value': 'You are GPT-4, a helpful assistant.'}] + x)

gpt4_dataset = Dataset.from_pandas(gpt4_df[["conversations"]])

dataset = concatenate_datasets([gpt4_dataset, boros_dataset]).shuffle()

dataset.select_columns(["conversations"]).to_json("/workspace/airoboros-3.2_plus_openchat_sharegpt4.json")

训练配置

base_model: ai21labs/Jamba-v0.1
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: /workspace/airoboros-3.2_plus_openchat_sharegpt4.json
    ds_type: json
    type: sharegpt
    conversation: chatml
dataset_prepared_path:
val_set_size: 0.01
output_dir: ./airoboros-3.2_plus_openchat_sharegpt4_one_epoch

sequence_len: 6000
sample_packing: true
pad_to_sequence_len: false
eval_sample_packing: true

use_wandb: true
wandb_project: axolotl
wandb_entity: peterd
wandb_name: airoboros-3.2_plus_openchat_sharegpt4

adapter: qlora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

low_cpu_mem_usage: true
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 5
saves_per_epoch: 5
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
weight_decay: 0.0
special_tokens: