llama-3-bophades-v3-8B Open Source Model - Enhance Authenticity, Free Deployment to Boost Mathematical Reasoning

Home

Llama 3 Bophades V3 8B

Developed by nbeerbower

A DPO fine-tuned model based on Llama-3-8b, focused on enhancing truthfulness and mathematical reasoning capabilities

Large Language Model

Transformers

Open Source License:Other #DPO fine-tuning #Mathematical reasoning #Truthfulness optimization

Downloads 44

Release Time : 5/2/2024

Model Overview

This model is an improved version of Llama-3-8b, fine-tuned using Direct Preference Optimization (DPO) with the truthy-dpo and orca_math_dpo datasets, aiming to enhance the model's ability to provide truthful answers and mathematical reasoning.

Model Features

Direct Preference Optimization (DPO)

Fine-tuned using the DPO method, optimizing output quality by comparing accepted and rejected answers

Multi-dataset fusion training

Combined training with truthy-dpo (truthfulness) and orca_math_dpo (mathematical reasoning) datasets

LoRA efficient fine-tuning

Employed Low-Rank Adaptation (LoRA) technology for parameter-efficient fine-tuning, reducing computational resource requirements

Model Capabilities

Text generation

Question answering systems

Mathematical problem solving

Truthful answer generation

Use Cases

Education

Mathematical problem solving

Helps students understand and solve various mathematical problems

Fine-tuned with the orca_math_dpo dataset to enhance mathematical reasoning capabilities

Information retrieval

Truthful question answering system

Provides more reliable and truthful question-answering services

Fine-tuned with the truthy-dpo dataset to reduce generation of false information

🚀 llama-3-bophades-v3-8B

This model is based on Llama-3-8b and offers enhanced performance by fine - tuning on specific datasets. It is governed by the META LLAMA 3 COMMUNITY LICENSE AGREEMENT, which ensures proper usage and compliance.

✨ Features

Base Model: Built upon nbeerbower/llama-3-wissenschaft-8B, leveraging its pre - trained capabilities.
Fine - Tuned Datasets: Finetuned on jondurbin/truthy-dpo-v0.1 and kyujinpy/orca_math_dpo to improve its performance on specific tasks.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

The following code shows the dataset preperation and message formatting:

def chatml_format(example):
    # Initialize formatted system message
    system = ""

    # Check if 'system' field exists and is not None
    if example.get('system'):
        system = "<|im_start|>system\n" + example['system'] + "<|im_end|>\n"

    # Format instruction
    instruction = ""
    if example.get('prompt'):
        instruction = example['prompt']
    if example.get('question'):
        instruction = example['question']
    prompt = "<|im_start|>user\n" + instruction + "<|im_end|>\n<|im_start|>assistant\n"

    # Format chosen answer
    chosen = example['chosen'] + "<|im_end|>\n"

    # Format rejected answer
    rejected = example['rejected'] + "<|im_end|>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

# Array of datasets to concat
ds = [
    "jondurbin/truthy-dpo-v0.1",
    "kyujinpy/orca_math_dpo"
]

# load_dataset and combine all
loaded_datasets = [load_dataset(dataset_name, split='train') for dataset_name in ds]
dataset = concatenate_datasets(loaded_datasets)

# Save columns
original_columns = dataset.column_names

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Format dataset
dataset = dataset.map(
    chatml_format,
    remove_columns=original_columns
)

Advanced Usage

The following code shows the LoRA, model, and training settings:

# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False

# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=1000,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)

# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    max_prompt_length=2048,
    max_length=4096,
    force_use_ref_model=True
)

# Fine-tune model with DPO
dpo_trainer.train()

📚 Documentation

Method

The model was finetuned using an A100 on Google Colab. You can refer to Fine - tune a Mistral-7b model with Direct Preference Optimization by Maxime Labonne for more details.

Configuration

The configuration involves dataset preperation, message formatting, and setting up LoRA, model, and training parameters. The code examples in the "Usage Examples" section provide a detailed implementation.

📄 License

This model is under the "other" license, specifically the META LLAMA 3 COMMUNITY LICENSE AGREEMENT, named as "llama3".

Property	Details
Library Name	transformers
Model Type	Based on Llama-3-8b, finetuned on specific datasets
Training Data	jondurbin/truthy-dpo-v0.1, kyujinpy/orca_math_dpo
License	other (META LLAMA 3 COMMUNITY LICENSE AGREEMENT)

image/png

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご