đ MaziyarPanahi/Llama-3-8B-Instruct-64k
This is a text-generation model based on Llama-3, extending the context length to 64k and trained with specific techniques.
⨠Features
- Extended Context Length: This model uses PoSE to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0. It further set rope_theta to 2M after continued pre - training to potentially further extend the context past 64k.
- Training Data: Trained on a subset of the RedPajama v1 dataset with text between 6k - 8k context. A rank stabilized LoRA of rank 256 was trained. WandB
- Quantized GGUF: All GGUF models come with a context length of
64000
. Check MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF
đĻ Installation
No specific installation steps are provided in the original README. If using the model with the transformers
library, make sure you have transformers
installed. You can install it via pip install transformers
.
đģ Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from transformers import pipeline
import torch
model_id = "MaziyarPanahi/Llama-3-8B-Instruct-64k"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
streamer = TextStreamer(tokenizer)
pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
model_kwargs={"torch_dtype": torch.bfloat16},
streamer=streamer
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|im_end|>")
]
outputs = pipeline(
prompt,
max_new_tokens=8192,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.95,
)
print(outputs[0]["generated_text"][len(prompt):])
Advanced Usage
There is no advanced usage example in the original README. If you want to adjust more parameters or use the model in different scenarios, you can refer to the transformers
library documentation.
đ Documentation
- Model Base: This model is based on winglian/Llama-3-8b-64k-PoSE by @winglian.
- Training Details: Used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k - 8k tokens.
đ§ Technical Details
This model uses PoSE to extend the context length. After continued pre - training, rope_theta is set to 2M to potentially further extend the context past 64k. A rank stabilized LoRA of rank 256 was trained on a subset of the RedPajama v1 dataset with text between 6k - 8k context.
đ License
The model is under the llama3
license. For more details, check LICENSE
