đ TinyLlama-1.1B
The TinyLlama project is on a mission to pretrain a 1.1B Llama model on 3 trillion tokens. Through appropriate optimization, this can be accomplished in "just" 90 days using 16 A100 - 40G GPUs đđ. The training kicked off on 2023 - 09 - 01.
Check out the project on GitHub
TinyLlama adopts the exact same architecture and tokenizer as Llama 2. This enables it to be easily integrated into numerous open - source projects based on Llama. Moreover, with only 1.1B parameters, TinyLlama is highly compact, making it suitable for a wide range of applications with limited computation and memory requirements.
đ Quick Start
The TinyLlama project is focused on pretraining a 1.1B Llama model on 3 trillion tokens. With proper optimization, the training can be completed in 90 days using 16 A100 - 40G GPUs. The training process began on September 1, 2023.
⨠Features
- Same Architecture as Llama 2: TinyLlama uses the same architecture and tokenizer as Llama 2, ensuring compatibility with many open - source Llama - based projects.
- Compact Model: With only 1.1B parameters, it can meet the needs of applications with restricted computational and memory resources.
đĻ Installation
You will need transformers>=4.31
. Check the TinyLlama GitHub page for more information.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer
import transformers
import torch
model = "PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
sequences = pipeline(
'The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs đđ. The training has started on 2023-09-01.',
do_sample=True,
top_k=10,
num_return_sequences=1,
repetition_penalty=1.5,
eos_token_id=tokenizer.eos_token_id,
max_length=500,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
đ Documentation
This Model
This is an intermediate checkpoint with 715K steps and 1.49T tokens. We suggest you not use this directly for inference.
Eval
Property |
Details |
Model Type |
TinyLlama-1.1B |
Training Data |
- cerebras/SlimPajama - 627B
- bigcode/starcoderdata
|
Model |
Pretrain Tokens |
HellaSwag |
Obqa |
WinoGrande |
ARC_c |
ARC_e |
boolq |
piqa |
avg |
Pythia-1.0B |
300B |
47.16 |
31.40 |
53.43 |
27.05 |
48.99 |
60.83 |
69.21 |
48.30 |
TinyLlama-1.1B-intermediate-step-50K-104b |
103B |
43.50 |
29.80 |
53.28 |
24.32 |
44.91 |
59.66 |
67.30 |
46.11 |
TinyLlama-1.1B-intermediate-step-240k-503b |
503B |
49.56 |
31.40 |
55.80 |
26.54 |
48.32 |
56.91 |
69.42 |
48.28 |
TinyLlama-1.1B-intermediate-step-480k-1007B |
1007B |
52.54 |
33.40 |
55.96 |
27.82 |
52.36 |
59.54 |
69.91 |
50.22 |
TinyLlama-1.1B-intermediate-step-715k-1.5T |
1.49T |
53.68 |
35.20 |
58.33 |
29.18 |
51.89 |
59.08 |
71.65 |
51.29 |
đ License
This project is licensed under the Apache - 2.0 license.