TinyLlama is Free and Open Source! The 1.1B Parameter Model Offers Compact and Efficient Text Generation Capability

Tinyllama 1.1B Intermediate Step 1431k 3T

Developed by TinyLlama

TinyLlama is a 1.1B parameter Llama model pretrained on 3 trillion tokens, designed to provide compact and efficient text generation capabilities.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Lightweight Llama #Efficient Pretraining #Multi-task Text Generation

Downloads 25.04k

Release Time : 12/28/2023

Model Overview

The TinyLlama project aims to pretrain a 1.1B parameter Llama model using 3 trillion tokens. Through optimization, the training can be completed in 90 days using 16 A100-40G GPUs.

Model Features

Efficient Pretraining

Pretrained on 3 trillion tokens, optimized to complete training within 90 days.

Compact Model

Only 1.1B parameters, suitable for applications with computational and memory constraints.

Compatibility

Adopts the same architecture and tokenizer as Llama 2, enabling plug-and-play integration into many Llama-based open-source projects.

Model Capabilities

Text Generation

Reasoning Tasks

Question Answering Systems

Use Cases

Natural Language Processing

AI2 Reasoning Challenge

Used to solve problems in the AI2 Reasoning Challenge

Standardized accuracy 33.87

HellaSwag

Used for text generation tasks on the HellaSwag dataset

Standardized accuracy 60.31

Education

MMLU

Used for multi-task language understanding evaluation

Accuracy 26.04

🚀 TinyLlama-1.1B

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With proper optimization, it can be achieved within 90 days using 16 A100 - 40G GPUs. The training started on 2023 - 09 - 01.

🚀 Quick Start

The project's GitHub repository can be found at https://github.com/jzhang38/TinyLlama.

✨ Features

Same Architecture as Llama 2: TinyLlama adopts the exact same architecture and tokenizer as Llama 2, enabling it to be easily integrated into many open - source projects built upon Llama.
Compact Model: With only 1.1B parameters, TinyLlama is suitable for applications with limited computation and memory requirements.

📚 Documentation

This Collection

This collection contains all checkpoints after the 1T fix. The branch name indicates the step and the number of tokens seen.

Eval

Model	Pretrain Tokens	HellaSwag	Obqa	WinoGrande	ARC_c	ARC_e	boolq	piqa	avg
Pythia - 1.0B	300B	47.16	31.40	53.43	27.05	48.99	60.83	69.21	48.30
TinyLlama - 1.1B - intermediate - step - 50K - 104b	103B	43.50	29.80	53.28	24.32	44.91	59.66	67.30	46.11
TinyLlama - 1.1B - intermediate - step - 240k - 503b	503B	49.56	31.40	55.80	26.54	48.32	56.91	69.42	48.28
TinyLlama - 1.1B - intermediate - step - 480k - 1007B	1007B	52.54	33.40	55.96	27.82	52.36	59.54	69.91	50.22
TinyLlama - 1.1B - intermediate - step - 715k - 1.5T	1.5T	53.68	35.20	58.33	29.18	51.89	59.08	71.65	51.29
TinyLlama - 1.1B - intermediate - step - 955k - 2T	2T	54.63	33.40	56.83	28.07	54.67	63.21	70.67	51.64
TinyLlama - 1.1B - intermediate - step - 1195k - 2.5T	2.5T	58.96	34.40	58.72	31.91	56.78	63.21	73.07	53.86
TinyLlama - 1.1B - intermediate - step - 1431k - 3T	3T	59.20	36.00	59.12	30.12	55.25	57.83	73.29	52.99

Open LLM Leaderboard Evaluation Results

Detailed results can be found [here](https://huggingface.co/datasets/open - llm - leaderboard/details_TinyLlama__TinyLlama-1.1B-intermediate-step-1431k-3T)

Property	Details
Model Type	TinyLlama - 1.1B
Training Data	cerebras/SlimPajama - 627B, bigcode/starcoderdata

Metric	Value
Avg.	36.42
AI2 Reasoning Challenge (25 - Shot)	33.87
HellaSwag (10 - Shot)	60.31
MMLU (5 - Shot)	26.04
TruthfulQA (0 - shot)	37.32
Winogrande (5 - shot)	59.51
GSM8k (5 - shot)	1.44

📄 License

The project uses the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご