14B-Qwen2.5-Freya-x1 Open Source Text Generation Model - Free Support for Instruction Understanding Tasks

14B Qwen2.5 Freya X1

Developed by Sao10K

A multi-stage training model based on Qwen2.5-14B and Qwen2.5-14B-Instruct, focusing on text generation and instruction understanding tasks.

Large Language Model

Transformers

Open Source License:Other #Multi-step LoRA training #Long text generation #Literary creation optimization

Downloads 252

Release Time : 12/31/2024

Model Overview

This model adopts a two-stage training approach, first conducting LoRA training on literature and raw text, followed by further fine-tuning on instruction data, aiming to improve text generation quality and instruction-following capabilities.

Model Features

Multi-stage training

Adopts a two-stage training method, first basic training then instruction fine-tuning, to enhance model performance

Efficient fine-tuning

Uses LoRA adapters for parameter-efficient fine-tuning, reducing training costs

Long context support

Supports context lengths of up to 16384 tokens

Optimized training

Employs various optimization techniques such as flash attention and gradient checkpointing to improve training efficiency

Model Capabilities

Text generation

Instruction understanding

Literary creation

Dialogue systems

Use Cases

Content creation

Literary creation

Generates literary works such as novels and essays

Trained on cleaned literary datasets, capable of producing relatively high-quality literary content

Dialogue systems

Intelligent assistant

Builds instruction-following dialogue assistants

Fine-tuned on instruction data to improve instruction understanding and execution capabilities

🚀 14B-Qwen2.5-Freya-v1

This project is about fine - tuning the Qwen 2.5 base model. It uses different training strategies and provides recommended settings for the model.

Freya Me during failed runs

🚀 Quick Start

Model Overview

I decided to mess around with training methods again, considering the re - emergence of methods like multi - step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methodology but done it my way.

Training Stages

Freya - S1

LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
Cleaned text and literature as best as I could, still, may have had issues here and there.

Freya - S2

The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
Reduced LoRA rank because it's mainly instruct and other details I won't get into.

Recommended Model Settings

💡 Usage Tip

Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.

Prompt Format: ChatML
Temperature: 1+ # I don't know, man.
min_p: 0.05

Training Information

Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.

Contact Information

You can contact me via this link.

📦 Model Details

Model Metadata

Property	Details
Library Name	transformers
License	other
License Name	qwen
License Link	https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE
Base Model	Qwen/Qwen2.5-14B
Tags	generated_from_trainer

Model Index

Name: 14B - Qwen2.5 - Freya - x1
Results: []

📚 Axolotl Config

See axolotl config

Axolotl version: 0.6.0

base_model:
- s1: Qwen/Qwen2.5-14B
- s2: Qwen/Qwen2.5-14B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false
sequence_len: 16384
bf16: auto
fp16:
tf32: false
flash_attention: true
special_tokens:
  
adapter: lora # 16-bit
lora_r:
- s1: 64
- s2: 32
lora_alpha: 64
lora_dropout: 0.2
lora_fan_in_fan_out:
peft_use_rslora: true
lora_target_linear: true
  
# Data
dataset_prepared_path: dataset_run_freya
datasets:
# S1 - Writing / Completion
  - path: datasets/eBooks-cleaned-75K
    type: completion
  - path: datasets/novels-clean-dedupe-10K
    type: completion
# S2 - Instruct
  - path: datasets/10k-amoral-full-fixed-sys.json
    type: chat_template
    chat_template: chatml
    roles_to_train: ["gpt"]
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    train_on_eos: turn
  - path: datasets/44k-hespera-smartshuffle.json
    type: chat_template
    chat_template: chatml
    roles_to_train: ["gpt"]
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    train_on_eos: turn
  - path: datasets/5k_rpg_adventure_instruct-sys.json
    type: chat_template
    chat_template: chatml
    roles_to_train: ["gpt"]
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    train_on_eos: turn
shuffle_merged_datasets: true
warmup_ratio: 0.1

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

# Iterations
num_epochs:
- s1: 1
- s2: 2

# Sampling
sample_packing: true
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false

# Batching
gradient_accumulation_steps: 4
micro_batch_size: 2
gradient_checkpointing: unsloth

# Evaluation
val_set_size: 0.025
evals_per_epoch: 5
eval_table_size:
eval_max_new_tokens: 256
eval_sample_packing: false
eval_batch_size: 1

# Optimizer
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate:
- s1: 0.000002
- s2: 0.000004
weight_decay: 0.2
max_grad_norm: 10.0

# Garbage Collection
gc_steps: 10

# Misc
deepspeed: ./deepspeed_configs/zero2.json

📄 License

This model is under the Qwen License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご