TinyLlama-1.1B-step-50K-105b Open Source Model - Can Be Trained at Low Cost in 90 Days After Optimization

Tinyllama 1.1B Step 50K 105b

Developed by TinyLlama

TinyLlama is a 1.1B parameter Llama model, planned to be pretrained on 3 trillion tokens, optimized to complete training in 90 days on 16 A100-40G GPUs.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Lightweight Llama #Efficient Pretraining #Low Resource Requirements

Downloads 14.41k

Release Time : 9/1/2023

Model Overview

The TinyLlama project aims to pretrain a compact 1.1B parameter Llama model, compatible with the Llama 2 architecture and tokenizer, suitable for applications with limited computational and memory resources.

Model Features

Efficient Training

Through optimization, only 16 A100-40G GPUs are required to complete pretraining on 3 trillion tokens in 90 days.

Compatibility

Fully adopts the same architecture and tokenizer as Llama 2, compatible with most Llama-based open-source projects.

Compactness

Contains only 1.1B parameters, suitable for applications with limited computational and memory resources.

Model Capabilities

Text Generation

Use Cases

Natural Language Processing

Text Generation

Generate coherent text content

🚀 TinyLlama-1.1B

The TinyLlama project is focused on pretraining a 1.1B Llama model on 3 trillion tokens. Through appropriate optimization, this can be accomplished in "just" 90 days using 16 A100 - 40G GPUs 🚀🚀. Training commenced on 2023 - 09 - 01.

Project Link

This model adopts the exact same architecture and tokenizer as Llama 2, enabling it to be easily integrated into numerous open - source projects based on Llama. Moreover, with only 1.1B parameters, TinyLlama is highly compact, making it suitable for various applications with limited computation and memory requirements.

🚀 Quick Start

Prerequisites

You will need transformers>=4.31. For more information, check the TinyLlama GitHub page.

Example Code

from transformers import AutoTokenizer
import transformers 
import torch
model = "PY007/TinyLlama-1.1B-step-50K-105b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    repetition_penalty=1.5,
    eos_token_id=tokenizer.eos_token_id,
    max_length=500,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

✨ Features

Same Architecture as Llama 2: It can be used in many Llama - based open - source projects.
Compact Size: With only 1.1B parameters, it is suitable for applications with limited resources.

📚 Documentation

This Model

This is an intermediate checkpoint with 50K steps and 105B tokens.

Releases Schedule

We will be releasing intermediate checkpoints according to the following schedule. Some baseline models are also included for comparison.

Date	HF Checkpoint	Tokens	Step	HellaSwag Acc_norm
Baseline	StableLM - Alpha - 3B	800B	--	38.31
Baseline	Pythia - 1B - intermediate - step - 50k - 105b	105B	50k	42.04
Baseline	Pythia - 1B	300B	143k	47.16
2023 - 09 - 04	TinyLlama - 1.1B - intermediate - step - 50k - 105b	105B	50k	43.50
2023 - 09 - 16	--	500B	--	--
2023 - 10 - 01	--	1T	--	--
2023 - 10 - 16	--	1.5T	--	--
2023 - 10 - 31	--	2T	--	--
2023 - 11 - 15	--	2.5T	--	--
2023 - 12 - 01	--	3T	--	--

📄 License

This project is licensed under the Apache - 2.0 license.

Property	Details
Model Type	TinyLlama-1.1B
Training Data	cerebras/SlimPajama - 627B, bigcode/starcoderdata

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご