đ Transformers
This project is based on the transformers
library, offering a fine - tuned model with specific configurations and training details. It aims to leverage the power of pre - trained models and adapt them to specific tasks.
đ Quick Start
The model is a fine - tuned version of [arcee - ai/Llama - 3.1 - SuperNova - Lite](https://huggingface.co/arcee - ai/Llama - 3.1 - SuperNova - Lite). You can refer to the following configuration and training details to understand how it was developed.
[
](https://github.com/axolotl - ai - cloud/axolotl)
See axolotl config
Axolotl version: 0.4.1
base_model: arcee-ai/Llama-3.1-SuperNova-Lite
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: NewEden/CharacterAI-logs-sharegpt-Ngram-Cleaned
type: sharegpt
conversation: llama3
- path: NewEden/OpenCAI-ShareGPT
type: sharegpt
conversation: llama3
chat_template: llama3
output_dir: ./outputs
adapter:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
sequence_len: 16384
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
wandb_project: CAI-Supernova
wandb_entity:
wandb_watch:
wandb_name: CAI-Supernova-2
wandb_log_model:
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 4
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
weight_decay: 0.05
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: unsloth
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 15
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
fsdp:
fsdp_config:
special_tokens:
pad_token: <|finetune_right_pad_id|>
eos_token: <|eot_id|>
đ Documentation
Model Description
This model is a fine - tuned version of [arcee - ai/Llama - 3.1 - SuperNova - Lite](https://huggingface.co/arcee - ai/Llama - 3.1 - SuperNova - Lite) on the None dataset.
Intended Uses & Limitations
More information needed
Training and Evaluation Data
More information needed
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi - GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 15
- num_epochs: 4
Training Results
More information needed
Framework Versions
- Transformers 4.44.2
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
đ License
The model is under the llama3
license.
đ Model Information
Property |
Details |
Library Name |
transformers |
License |
llama3 |
Base Model |
arcee - ai/Llama - 3.1 - SuperNova - Lite |
Tags |
generated_from_trainer |