đ TinyLlama-1.1B
The TinyLlama project is focused on pretraining a 1.1B Llama model on 3 trillion tokens. Through appropriate optimization, this can be accomplished in "just" 90 days using 16 A100 - 40G GPUs đđ. Training commenced on 2023 - 09 - 01.
Project Link
This model adopts the exact same architecture and tokenizer as Llama 2, enabling it to be easily integrated into numerous open - source projects based on Llama. Moreover, with only 1.1B parameters, TinyLlama is highly compact, making it suitable for various applications with limited computation and memory requirements.
đ Quick Start
Prerequisites
You will need transformers>=4.31
. For more information, check the TinyLlama GitHub page.
Example Code
from transformers import AutoTokenizer
import transformers
import torch
model = "PY007/TinyLlama-1.1B-step-50K-105b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
sequences = pipeline(
'The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs đđ. The training has started on 2023-09-01.',
do_sample=True,
top_k=10,
num_return_sequences=1,
repetition_penalty=1.5,
eos_token_id=tokenizer.eos_token_id,
max_length=500,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
⨠Features
- Same Architecture as Llama 2: It can be used in many Llama - based open - source projects.
- Compact Size: With only 1.1B parameters, it is suitable for applications with limited resources.
đ Documentation
This Model
This is an intermediate checkpoint with 50K steps and 105B tokens.
Releases Schedule
We will be releasing intermediate checkpoints according to the following schedule. Some baseline models are also included for comparison.
đ License
This project is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
TinyLlama-1.1B |
Training Data |
cerebras/SlimPajama - 627B, bigcode/starcoderdata |