Vapor_v2_7B开源大语言模型 - 支持13种语言处理的高效交流工具

首页

Vapor V2 7B

由 FourOhFour 开发

基于Qwen/Qwen2.5-7B模型在多语言数据集上微调的大语言模型，支持13种语言处理

大型语言模型

Transformers

开源协议:Apache-2.0 #多语言对话 #长文本处理 #指令微调

下载量 60

发布时间 : 9/20/2024

模型简介

这是一个基于Qwen2.5-7B模型微调的多语言大语言模型，专注于对话生成和指令跟随任务，在多种专业领域数据集上进行了训练

模型特点

多语言支持

支持13种语言的文本生成和理解，包括主要亚洲和欧洲语言

长上下文处理

支持8192个token的长上下文处理能力

多领域知识

在医学、军事、推理等多个专业领域数据集上进行训练

高效训练

使用flash attention和梯度检查点等技术优化训练效率

模型能力

多语言文本生成

指令跟随

对话系统

知识问答

专业领域咨询

使用案例

智能助手

多语言客服机器人

为跨国企业提供多语言客户服务支持

教育

语言学习助手

帮助学习者练习多种语言的写作和对话

专业咨询

医学信息咨询

提供基础医学知识和健康建议

军事生存指南

提供军事和野外生存相关专业知识

🚀 输出模型（outputs/out）

本项目基于 transformers 库，所训练的模型是 Qwen/Qwen2.5 - 7B 在特定数据集上的微调版本。该模型支持多种语言，包括中文、英文、法文、西班牙文等。本项目使用 axolotl 工具进行训练，以下是详细介绍。

查看 axolotl 配置

axolotl 版本：0.4.1

base_model: Qwen/Qwen2.5-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
    type: sharegpt
    conversation: chatml
  - path: NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered
    type: sharegpt
    conversation: chatml
  - path: Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
    type: sharegpt
    conversation: chatml
  - path: NewEden/Gryphe-Sonnet-3.5-35k-Subset
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/Reasoning-1shot_ShareGPT
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/GU_Instruct-ShareGPT
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/Medical_Instruct-ShareGPT
    type: sharegpt
    conversation: chatml
  - path: AquaV/Resistance-Sharegpt
    type: sharegpt
    conversation: chatml
  - path: AquaV/US-Army-Survival-Sharegpt
    type: sharegpt
    conversation: chatml
  - path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
    type: sharegpt
    conversation: chatml

chat_template: chatml

val_set_size: 0.002
output_dir: ./outputs/out

adapter:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:

sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

wandb_project: qwen7B
wandb_entity:
wandb_watch:
wandb_name: qwen7B
wandb_log_model:

gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001
weight_decay: 0.05

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 2

debug:
deepspeed: 
fsdp:
fsdp_config:

special_tokens:
  pad_token: <pad>

🚀 快速开始

此模型是 Qwen/Qwen2.5 - 7B 在特定数据集上的微调版本，在评估集上取得了以下结果：

损失值：0.7923

📚 详细文档

模型描述

更多信息待补充。

预期用途与限制

更多信息待补充。

训练和评估数据

更多信息待补充。

🔧 技术细节

训练超参数

训练过程中使用了以下超参数：

学习率：1e - 05
训练批次大小：1
评估批次大小：1
随机种子：42
分布式类型：多GPU
设备数量：4
梯度累积步数：32
总训练批次大小：128
总评估批次大小：4
优化器：Adam，β=(0.9, 0.999)，ε = 1e - 08
学习率调度器类型：余弦
学习率调度器热身步数：46
训练轮数：2

训练结果

训练损失	轮数	步数	验证损失
1.0297	0.0043	1	1.1468
0.8512	0.2515	58	0.8729
0.8496	0.5030	116	0.8193
0.8175	0.7546	174	0.8033
0.7868	1.0041	232	0.7961
0.8119	1.2555	290	0.7934
0.799	1.5069	348	0.7926
0.7891	1.7583	406	0.7923