Llama-3-8B-Instruct-64k Open Source Large Language Model - Supports Super Long Context Conversations for Long Text Processing

Llama 3 8B Instruct 64k

Developed by MaziyarPanahi

An 8B parameter large language model developed based on winglian/Llama-3-8b-64k-PoSE, using PoSE technology to extend context length to 64k and optimized with DPO fine-tuning

Large Language Model

Transformers

English#64k long text processing #instruction fine-tuning optimization #DPO reinforcement learning

Downloads 91

Release Time : 4/25/2024

Model Overview

This is an 8B parameter large language model based on the Meta Llama-3 architecture, extending context length to 64k via PoSE technology and optimized with DPO fine-tuning, suitable for long-text generation and dialogue tasks.

Model Features

64k long context support

Uses PoSE technology to extend context length from 8k to 64k, suitable for processing long documents and complex dialogues

DPO fine-tuning optimization

Fine-tuned with Intel/orca_dpo_pairs dataset using DPO to improve response quality

Efficient inference

Supports flash_attention_2 and bfloat16 inference for improved efficiency

Model Capabilities

Long text generation

Dialogue systems

Instruction following

Use Cases

Dialogue systems

Role-playing chatbot

Can be used to build chatbots with specific character traits, such as the pirate chatbot in the example

Capable of generating coherent dialogues that align with character settings

Long document processing

Long document summarization

Leverages 64k context length advantage to process long documents and generate summaries

🚀 MaziyarPanahi/Llama-3-8B-Instruct-64k

This is a text-generation model based on Llama-3, extending the context length to 64k and trained with specific techniques.

✨ Features

Extended Context Length: This model uses PoSE to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0. It further set rope_theta to 2M after continued pre - training to potentially further extend the context past 64k.
Training Data: Trained on a subset of the RedPajama v1 dataset with text between 6k - 8k context. A rank stabilized LoRA of rank 256 was trained. WandB
Quantized GGUF: All GGUF models come with a context length of 64000. Check MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF

📦 Installation

No specific installation steps are provided in the original README. If using the model with the transformers library, make sure you have transformers installed. You can install it via pip install transformers.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from transformers import pipeline
import torch

model_id = "MaziyarPanahi/Llama-3-8B-Instruct-64k"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    # attn_implementation="flash_attention_2"
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

streamer = TextStreamer(tokenizer)

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},
    streamer=streamer
)

# Then you can use the pipeline to generate text.

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|im_end|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=8192,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)
print(outputs[0]["generated_text"][len(prompt):])

Advanced Usage

There is no advanced usage example in the original README. If you want to adjust more parameters or use the model in different scenarios, you can refer to the transformers library documentation.

📚 Documentation

Model Base: This model is based on winglian/Llama-3-8b-64k-PoSE by @winglian.
Training Details: Used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k - 8k tokens.

🔧 Technical Details

This model uses PoSE to extend the context length. After continued pre - training, rope_theta is set to 2M to potentially further extend the context past 64k. A rank stabilized LoRA of rank 256 was trained on a subset of the RedPajama v1 dataset with text between 6k - 8k context.

📄 License

The model is under the llama3 license. For more details, check LICENSE

Llama-3 DPO Logo

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご