Vapor_v2_7B Open Source Large Language Model - An Efficient Communication Tool Supporting 13 Languages Processing

Home

Vapor V2 7B

Developed by FourOhFour

A large language model fine-tuned on multilingual datasets based on Qwen/Qwen2.5-7B, supporting 13 language processing

Large Language Model

Transformers

Open Source License:Apache-2.0 #Multilingual Dialogue #Long Text Processing #Instruction Fine-tuning

Downloads 60

Release Time : 9/20/2024

Model Overview

This is a multilingual large language model fine-tuned from Qwen2.5-7B, specializing in dialogue generation and instruction-following tasks, trained on various professional domain datasets

Model Features

Multilingual Support

Supports text generation and understanding in 13 languages, including major Asian and European languages

Long Context Processing

Capable of processing long contexts up to 8192 tokens

Multi-domain Knowledge

Trained on various professional domain datasets including medical, military, and reasoning

Efficient Training

Optimized training efficiency using techniques like flash attention and gradient checkpointing

Model Capabilities

Multilingual Text Generation

Instruction Following

Dialogue Systems

Knowledge Q&A

Professional Domain Consultation

Use Cases

Intelligent Assistant

Multilingual Customer Service Bot

Provides multilingual customer support for multinational corporations

Education

Language Learning Assistant

Helps learners practice writing and conversation in multiple languages

Professional Consultation

Medical Information Consultation

Provides basic medical knowledge and health advice

Military Survival Guide

Offers professional knowledge related to military and wilderness survival

🚀 Transformers Model README

This project is a fine - tuned model based on the transformers library. It offers a fine - tuned version of Qwen/Qwen2.5 - 7B, with specific performance on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of Qwen/Qwen2.5-7B on the None dataset. It achieves a loss of 0.7923 on the evaluation set.

See axolotl config

The axolotl version used is 0.4.1. The detailed configuration is as follows:

base_model: Qwen/Qwen2.5-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
    type: sharegpt
    conversation: chatml
  - path: NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered
    type: sharegpt
    conversation: chatml
  - path: Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
    type: sharegpt
    conversation: chatml
  - path: NewEden/Gryphe-Sonnet-3.5-35k-Subset
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/Reasoning-1shot_ShareGPT
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/GU_Instruct-ShareGPT
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/Medical_Instruct-ShareGPT
    type: sharegpt
    conversation: chatml
  - path: AquaV/Resistance-Sharegpt
    type: sharegpt
    conversation: chatml
  - path: AquaV/US-Army-Survival-Sharegpt
    type: sharegpt
    conversation: chatml
  - path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
    type: sharegpt
    conversation: chatml

chat_template: chatml

val_set_size: 0.002
output_dir: ./outputs/out

adapter:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:

sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

wandb_project: qwen7B
wandb_entity:
wandb_watch:
wandb_name: qwen7B
wandb_log_model:

gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001
weight_decay: 0.05

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 2

debug:
deepspeed: 
fsdp:
fsdp_config:

special_tokens:
  pad_token: <pad>

📚 Documentation

Model description

This model is a fine - tuned version of Qwen/Qwen2.5-7B. More detailed information about the model is yet to be provided.

Intended uses & limitations

More information regarding the intended uses and limitations of this model is needed.

Training and evaluation data

Details about the training and evaluation data are yet to be provided.

Training procedure

Training hyperparameters

The following hyperparameters were used during the training process:

Property	Details
learning_rate	1e - 05
train_batch_size	1
eval_batch_size	1
seed	42
distributed_type	multi - GPU
num_devices	4
gradient_accumulation_steps	32
total_train_batch_size	128
total_eval_batch_size	4
optimizer	Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type	cosine
lr_scheduler_warmup_steps	46
num_epochs	2

Training results

The training results are presented in the following table:

Training Loss	Epoch	Step	Validation Loss
1.0297	0.0043	1	1.1468
0.8512	0.2515	58	0.8729
0.8496	0.5030	116	0.8193
0.8175	0.7546	174	0.8033
0.7868	1.0041	232	0.7961
0.8119	1.2555	290	0.7934
0.799	1.5069	348	0.7926
0.7891	1.7583	406	0.7923

Framework versions

The versions of the frameworks used are as follows:

Transformers 4.45.0.dev0
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

📄 License

The project is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご