🚀 llama-3-bophades-v3-8B
This model is based on Llama-3-8b and offers enhanced performance by fine - tuning on specific datasets. It is governed by the META LLAMA 3 COMMUNITY LICENSE AGREEMENT, which ensures proper usage and compliance.
✨ Features
📦 Installation
No installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
The following code shows the dataset preperation and message formatting:
def chatml_format(example):
system = ""
if example.get('system'):
system = "<|im_start|>system\n" + example['system'] + "<|im_end|>\n"
instruction = ""
if example.get('prompt'):
instruction = example['prompt']
if example.get('question'):
instruction = example['question']
prompt = "<|im_start|>user\n" + instruction + "<|im_end|>\n<|im_start|>assistant\n"
chosen = example['chosen'] + "<|im_end|>\n"
rejected = example['rejected'] + "<|im_end|>\n"
return {
"prompt": system + prompt,
"chosen": chosen,
"rejected": rejected,
}
ds = [
"jondurbin/truthy-dpo-v0.1",
"kyujinpy/orca_math_dpo"
]
loaded_datasets = [load_dataset(dataset_name, split='train') for dataset_name in ds]
dataset = concatenate_datasets(loaded_datasets)
original_columns = dataset.column_names
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
dataset = dataset.map(
chatml_format,
remove_columns=original_columns
)
Advanced Usage
The following code shows the LoRA, model, and training settings:
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=1000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=2048,
max_length=4096,
force_use_ref_model=True
)
dpo_trainer.train()
📚 Documentation
Method
The model was finetuned using an A100 on Google Colab. You can refer to Fine - tune a Mistral-7b model with Direct Preference Optimization by Maxime Labonne for more details.
Configuration
The configuration involves dataset preperation, message formatting, and setting up LoRA, model, and training parameters. The code examples in the "Usage Examples" section provide a detailed implementation.
📄 License
This model is under the "other" license, specifically the META LLAMA 3 COMMUNITY LICENSE AGREEMENT, named as "llama3".
