Llama-3.1-8B-ContinuedTraining2-FFT Open-Source Large Model - Specializing in English Text and Python Code Tasks

Llama 3.1 8B ContinuedTraining2 FFT

Developed by ericflo

A fully parameter fine-tuned large language model based on Meta-Llama-3.1-8B architecture, focused on English text and Python code tasks, utilizing diverse data mixture training methods

Large Language Model EnglishOpen Source License:Apache-2.0 #Full Parameter Fine-Tuning #Fill-in-the-Middle (FIM)#Multi-format Instruction Following

Downloads 30

Release Time : 9/9/2024

Model Overview

This is a fully parameter fine-tuned large language model that supports text generation, code completion, and instruction-following tasks, with particular expertise in Python code-related tasks

Model Features

Full Parameter Fine-Tuning

Unlike LoRA methods, this version updates all model parameters for comprehensive learning

Diverse Data Mixture

Combines pretraining and instruction datasets for comprehensive language understanding

Fill-in-the-Middle Training (FIM)

Incorporates FIM tasks to enhance contextual understanding, especially for code completion

8-bit AdamW Optimizer

Uses adamw_bnb_8bit for memory-efficient training

Flash Attention 2

Employs flash_attention_2 to accelerate the training process

Model Capabilities

Text completion and generation

Python code completion

Instruction following

Context-aware text filling

Reverse prediction and instruction back-translation

Use Cases

Programming Assistance

Python Code Completion

Automatically completes code writing given partial code snippets

Improves development efficiency and reduces coding errors

Text Processing

Text Filling

Generates intermediate content given text prefixes and suffixes

Enhances text coherence and logical consistency

🚀 Custom LLM with Full Fine-Tuning

This project implements a custom-trained language model based on the Meta-Llama-3.1-8B architecture, using full fine-tuning for enhanced performance across various tasks.

🚀 Quick Start

Setup

pip install -U transformers accelerate trl wandb wheel packaging peft bitsandbytes liger-kernel flash_attn

Training Command

python sft_14.py \
    --run_name="llama3.1-8b-continued2" \
    --model_name_or_path="meta-llama/Meta-Llama-3.1-8B" \
    --dataset_name="mlfoundations/dclm-baseline-1.0-parquet,mlabonne/FineTome-100k" \
    --report_to="wandb" \
    --optim="adamw_bnb_8bit" \
    --lr_scheduler_type="cosine" \
    --max_steps=100000 \
    --max_seq_length=64000 \
    --learning_rate=0.00001 \
    --attn_implementation="flash_attention_2" \
    --save_strategy="steps" \
    --save_steps 50 \
    --save_total_limit=10 \
    --per_device_train_batch_size=1 \
    --per_device_eval_batch_size=1 \
    --gradient_accumulation_steps=8 \
    --logging_steps=1 \
    --num_train_epochs=1 \
    --push_to_hub \
    --hub_model_id="ericflo/Llama-3.1-8B-ContinuedTraining2-FFT" \
    --hub_strategy="all_checkpoints" \
    --gradient_checkpointing \
    --use_liger=true \
    --packing=true \
    --torch_dtype="bfloat16" \
    --output_dir="continuedtraining2_output"

✨ Features

Model Overview

This project implements a custom-trained language model based on the Meta-Llama-3.1-8B architecture. Unlike the previous version which used a high-rank adapter, this model employs full fine-tuning for enhanced learning capacity across a variety of tasks.

Developer: Eric Florenzano
Model Type: Large Language Model (LLM)
Language(s): English, with a focus on Python for code-related tasks
License: Apache-2.0
Base Model: meta-llama/Meta-Llama-3.1-8B

Unique Training Approach

This model is trained directly on a mixture of high-quality datasets for general text and code completion tasks, as well as instruction-following. Key features include:

Full Fine-Tuning: Unlike the previous LoRA approach, this version uses full fine-tuning to update all model parameters.
Diverse Dataset Mixture: Combines pretraining and instruction datasets for comprehensive language understanding.
Multi-Format Instruction Tuning: Alternates between ChatML and Llama Chat templates for flexible instruction-following.
Contextual Data Prefixing: Uses source information to address data imbalance during training.
Fill-in-the-Middle (FIM) Training: Incorporates FIM tasks for enhanced context understanding.

Key Features

Full Fine-Tuning: Updates all model parameters for comprehensive learning.
8-bit AdamW Optimizer: Uses adamw_bnb_8bit for memory-efficient training.
Flash Attention 2: Implements flash_attention_2 for faster training.
Gradient Checkpointing: Enables training with limited GPU memory.
Liger and Packing: Utilizes use_liger=true and packing=true for efficient data handling.
BFloat16 Precision: Uses bfloat16 for balanced precision and performance.

Advanced Training Techniques

Fill-in-the-Middle (FIM) Capability

FIM allows the model to complete text when given both a prefix and a suffix, making it particularly useful for tasks like code completion, text infilling, and context-aware generation.

To use the FIM capability, structure your input with special tokens:

<|fim_start|>: Marks the start of the FIM input
<|fim_marker|>: Separates the prefix from the suffix
<|fim_gen|>: Indicates where the generated content should begin
<|fim_end|>: Marks the end of the FIM input

Example FIM input:

<|fim_start|>{prefix}<|fim_marker|>{suffix}<|fim_gen|>

The model will generate content to replace <|fim_gen|>, filling in the middle between the prefix and suffix.

Reverse Prediction and Instruction Backtranslation

This technique enhances the model's context understanding by training it to predict previous parts of a conversation or text. It's also known as instruction backtranslation.

How it works:

The model is given a snippet of conversation or text.
It's then tasked with predicting what came before this snippet.
This process helps the model understand context, conversation flow, and logical progression of ideas.

Benefits:

Improved context understanding
Enhanced ability to maintain coherent, contextually appropriate conversations
Better grasp of cause-and-effect relationships in text

Example use case: Input:

Human: Thank you for the information about Paris. Can you recommend some popular tourist attractions there?

Task: Predict the previous exchange in this conversation.

Possible model output:

Human: What's the capital of France?
Assistant: The capital of France is Paris. It's known as the "City of Light" and is famous for its art, culture, and historic landmarks.
Human: Thank you for the information about Paris. Can you recommend some popular tourist attractions there?

Meta-FIM

Meta-FIM applies the Fill-in-the-Middle technique to larger chunks of text, including entire conversations or documents. This improves the model's ability to handle complex, nested contexts.

Benefits:

Enhanced understanding of long-range dependencies in text
Improved ability to maintain coherence across longer contexts
Better performance on tasks requiring integration of information from multiple parts of a document or conversation

Example:

<|fim_start|>Human: What's the weather like today?
Assistant: I'm sorry, but I don't have access to real-time weather information. Could you please provide your location?<|fim_marker|>Human: Thank you for the information about Paris. Can you recommend some popular tourist attractions there?<|fim_gen|>Human: I'm in Paris, France.
Assistant: Ah, Paris! While I can't provide real-time weather information, I can tell you that Paris generally has a temperate climate. May I suggest checking a local weather website or app for the most up-to-date information?
Human: That's a good idea, thanks. While we're on the topic of Paris, can you tell me about some famous landmarks?
Assistant: Certainly! Paris is known for its iconic landmarks. Here are a few famous ones:
1. Eiffel Tower
2. Louvre Museum
3. Notre-Dame Cathedral
4. Arc de Triomphe
5. Sacré-Cœur Basilica<|fim_end|>

In this example, the model needs to understand and generate a coherent conversation that fits between the given start and end points.

📦 Installation

The installation steps are included in the "Quick Start" section. You can use the following command to install the necessary dependencies:

pip install -U transformers accelerate trl wandb wheel packaging peft bitsandbytes liger-kernel flash_attn

💻 Usage Examples

Using FIM with the Model

<|fim_start|>{prefix}<|fim_marker|>{suffix}<|fim_gen|>

Reverse Prediction Example

Input:

Human: Thank you for the information about Paris. Can you recommend some popular tourist attractions there?

Output:

Human: What's the capital of France?
Assistant: The capital of France is Paris. It's known as the "City of Light" and is famous for its art, culture, and historic landmarks.
Human: Thank you for the information about Paris. Can you recommend some popular tourist attractions there?

Meta-FIM Example

<|fim_start|>Human: What's the weather like today?
Assistant: I'm sorry, but I don't have access to real-time weather information. Could you please provide your location?<|fim_marker|>Human: Thank you for the information about Paris. Can you recommend some popular tourist attractions there?<|fim_gen|>Human: I'm in Paris, France.
Assistant: Ah, Paris! While I can't provide real-time weather information, I can tell you that Paris generally has a temperate climate. May I suggest checking a local weather website or app for the most up-to-date information?
Human: That's a good idea, thanks. While we're on the topic of Paris, can you tell me about some famous landmarks?
Assistant: Certainly! Paris is known for its iconic landmarks. Here are a few famous ones:
1. Eiffel Tower
2. Louvre Museum
3. Notre-Dame Cathedral
4. Arc de Triomphe
5. Sacré-Cœur Basilica<|fim_end|>

📚 Documentation

Training Data

The model is trained on a blend of high-quality data sources:

FineTome-100k: High-quality instruction-tuned data for general language tasks.
dclm-baseline-1.0-parquet: Apple's pretraining corpus for text completion/prediction.
English, Spanish, and French Wikipedia: For broad language understanding.
Starcoder: High-quality Python-focused code dataset for code completion tasks.

Evaluation

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
tinyBenchmarks	N/A
- tinyArc	0	none	25	acc_norm	↑	0.5821	±	N/A
- tinyGSM8k	0	flexible-extract	5	exact_match	↑	0.4989	±	N/A
		strict-match	5	exact_match	↑	0.4867	±	N/A
- tinyHellaswag	0	none	10	acc_norm	↑	0.8307	±	N/A
- tinyMMLU	0	none	0	acc_norm	↑	0.6651	±	N/A
- tinyTruthfulQA	0	none	0	acc	↑	0.4991	±	N/A
- tinyWinogrande	0	none	5	acc_norm	↑	0.7558	±	N/A

Intended Uses

This model is designed for:

Text Completion and Generation
Code Completion (especially Python)
Instruction Following
General Language Understanding
Context-Aware Text Infilling (using FIM)

Limitations and Biases

The model may exhibit biases present in the training data.
It lacks real-time knowledge beyond its training data.
Should not be used for critical decision-making without human oversight.

Technical Specifications

Base Model: meta-llama/Meta-Llama-3.1-8B
Training Approach: Full Fine-Tuning
Library: Hugging Face Transformers and TRL

🔧 Technical Details

The model uses full fine-tuning to update all parameters of the Meta-Llama-3.1-8B architecture. It incorporates advanced techniques such as FIM training, reverse prediction, and Meta-FIM to enhance its context understanding and generation capabilities. Additionally, it uses an 8-bit AdamW optimizer, Flash Attention 2, gradient checkpointing, Liger and packing, and BFloat16 precision for efficient training.

📄 License

This project is licensed under the Apache-2.0 license.

📞 Contact

For inquiries about this model, please contact Eric Florenzano through the model repository.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご