LiteLlama-460M-1T Open-Source Language Model - The Streamlined Version is Free to Use, Empowering Diverse Text Tasks

Litellama 460M 1T

Developed by ahxt

LiteLlama is a streamlined version of Meta AI's LLaMa 2, featuring only 460 million parameters and trained on 1 trillion tokens as an open-source language model

Large Language Model

Transformers

EnglishOpen Source License:MIT #Streamlined Llama2 Reproduction #Small Parameters Large Corpus #English Text Generation

Downloads 1,225

Release Time : 1/7/2024

Model Overview

A lightweight open-source language model that replicates the LLaMa 2 architecture with significantly reduced scale, suitable for text generation and comprehension tasks

Model Features

Lightweight & Efficient

Only 460 million parameters, significantly smaller than the original LLaMa 2

Large-scale Training

Trained on approximately 1 trillion tokens to ensure model performance

Open-source Availability

Released under MIT license for free use and modification

Model Capabilities

Text Generation

Question Answering

Language Understanding

Use Cases

Education

Knowledge Q&A

Answering various common-sense questions

Can accurately answer basic questions like 'What is the largest bird?'

Research

Lightweight Language Model Research

Serving as a research benchmark for small language models

Performs well on benchmarks like MMLU

🚀 LiteLlama: Reduced-Scale Llama

We present an open-source reproduction of Meta AI's LLaMa 2 with significantly reduced model sizes.

This project offers an open - source replication of Meta AI's LLaMa 2. Notably, it features substantially reduced model sizes. For instance, LiteLlama - 460M - 1T has 460M parameters and is trained with 1T tokens.

✨ Features

Reduced Scale: Significantly smaller model sizes compared to the original LLaMa 2.
Open - Source: An open - source reproduction for community use.

📦 Installation

This section is not provided in the original README, so it is skipped.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = 'ahxt/LiteLlama-460M-1T'

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.eval()

prompt = 'Q: What is the largest bird?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
tokens = model.generate(input_ids, max_length=20)
print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
# Q: What is the largest bird?\nA: The largest bird is a black-headed gull.

📚 Documentation

Dataset and Tokenization

We train our models on part of RedPajama dataset. We use the GPT2Tokenizer to tokenize the text.

Training Details

The model was trained with ~1T tokens (0.98T). num of tokens = stepslengthbatch_size = 4996791024192 = 98240888832≈0.98T.

The training curve is at this WandB project.

Evaluation

MMLU Task Evaluation

Models	#parameters	zero - shot	5 - shot
llama	7B	28.46	35.05
openllama	3B	24.90	26.71
TinyLlama - 1.1B - step - 50K - 105b	1.1B	19.00	26.53
LiteLlama - 460M - 1T	0.46B	21.13	26.39

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	26.65
ARC (25 - shot)	24.91
HellaSwag (10 - shot)	38.47
MMLU (5 - shot)	26.17
TruthfulQA (0 - shot)	41.59
Winogrande (5 - shot)	49.88
GSM8K (5 - shot)	0.0
DROP (3 - shot)	5.51

🔧 Technical Details

The model's training process involves specific calculations for the number of tokens used, and the training curve can be tracked through the provided WandB project link.

📄 License

This model is released under the MIT License.

📞 Contact

This model was developed by Xiaotian Han from Texas A&M University at the DATA Lab under the supervision of Prof. Xia "Ben" Hu.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご