Pythia-410m Open-Source Language Model - Boosting Interpretability Research, Multiple Parameters and Checkpoints Available

Pythia 410m

Developed by EleutherAI

Pythia is a series of causal language models developed by EleutherAI, specifically designed for interpretability research. It includes 8 model sizes ranging from 70 million to 12 billion parameters, providing 154 training checkpoints.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Causal Language Model #Interpretability Research #Multi-Checkpoint Tracking

Downloads 83.28k

Release Time : 2/13/2023

Model Overview

A Transformer-based English language model using the GPT-NeoX architecture, trained on the Pile dataset, primarily used for studying the behavior and functionality of large language models.

Model Features

Complete Training Checkpoints

Provides 154 intermediate training checkpoints to facilitate the study of model evolution.

Scientific Experimental Design

All model sizes use the same training data and sequence to ensure experimental comparability.

Performance Benchmarking

Achieves or surpasses the performance of similar-scale models (e.g., OPT, GPT-Neo).

Deduplication Comparison

Each model size offers two versions: one trained on original data and another on globally deduplicated data.

Model Capabilities

English Text Generation

Language Model Behavior Research

Model Interpretability Analysis

Use Cases

Academic Research

Language Model Behavior Analysis

Study the parameter variation patterns of the model at different training stages.

Track model capability development through 154 checkpoints.

Deduplicated Data Impact Study

Compare performance differences between models trained on original and deduplicated data.

Technical Validation

Medium-Scale Model Benchmarking

Serve as a reference model for the 400M parameter level for technical comparisons.

Outperforms similar models like OPT-350M.

🚀 Pythia-410M

The Pythia Scaling Suite is a collection of models developed to facilitate interpretability research (see paper). It contains two sets of eight models of sizes 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. This suite is designed to promote scientific research on large language models, especially interpretability research, and its models match or exceed the performance of similar and same - sized models.

🚀 Quick Start

Pythia models can be loaded and used via the following code, demonstrated here for the third pythia - 70m - deduped checkpoint:

from transformers import GPTNeoXForCausalLM, AutoTokenizer

model = GPTNeoXForCausalLM.from_pretrained(
  "EleutherAI/pythia-70m-deduped",
  revision="step3000",
  cache_dir="./pythia-70m-deduped/step3000",
)

tokenizer = AutoTokenizer.from_pretrained(
  "EleutherAI/pythia-70m-deduped",
  revision="step3000",
  cache_dir="./pythia-70m-deduped/step3000",
)

inputs = tokenizer("Hello, I am", return_tensors="pt")
tokens = model.generate(**inputs)
tokenizer.decode(tokens[0])

Revision/branch step143000 corresponds exactly to the model checkpoint on the main branch of each model. For more information on how to use all Pythia models, see documentation on GitHub.

✨ Features

Research - Oriented: The Pythia suite is developed to facilitate interpretability research on large language models.
Multiple Sizes and Checkpoints: It contains models of various sizes (70M - 12B) and provides 154 intermediate checkpoints per model, hosted on Hugging Face as branches.
Performance: The models match or exceed the performance of similar and same - sized models like those in the OPT and GPT - Neo suites.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import GPTNeoXForCausalLM, AutoTokenizer

model = GPTNeoXForCausalLM.from_pretrained(
  "EleutherAI/pythia-70m-deduped",
  revision="step3000",
  cache_dir="./pythia-70m-deduped/step3000",
)

tokenizer = AutoTokenizer.from_pretrained(
  "EleutherAI/pythia-70m-deduped",
  revision="step3000",
  cache_dir="./pythia-70m-deduped/step3000",
)

inputs = tokenizer("Hello, I am", return_tensors="pt")
tokens = model.generate(**inputs)
tokenizer.decode(tokens[0])

📚 Documentation

Model Details

Property	Details
Developed by	EleutherAI
Model Type	Transformer - based Language Model
Language	English
Learn more	Pythia's GitHub repository for training procedure, config files, and details on how to use. See paper for more evals and implementation details.
Library	[GPT - NeoX](https://github.com/EleutherAI/gpt - neox)
License	Apache 2.0
Contact	To ask questions about this model, join the EleutherAI Discord, and post them in `#release - discussion`. Please read the existing Pythia documentation before asking about it in the EleutherAI Discord. For general correspondence: contact@eleuther.ai.

Pythia model	Non - Embedding Params	Layers	Model Dim	Heads	Batch Size	Learning Rate	Equivalent Models
70M	18,915,328	6	512	8	2M	1.0 x 10^-3	—
160M	85,056,000	12	768	12	2M	6.0 x 10^-4	GPT - Neo 125M, OPT - 125M
410M	302,311,424	24	1024	16	2M	3.0 x 10^-4	OPT - 350M
1.0B	805,736,448	16	2048	8	2M	3.0 x 10^-4	—
1.4B	1,208,602,624	24	2048	16	2M	2.0 x 10^-4	GPT - Neo 1.3B, OPT - 1.3B
2.8B	2,517,652,480	32	2560	32	2M	1.6 x 10^-4	GPT - Neo 2.7B, OPT - 2.7B
6.9B	6,444,163,072	32	4096	32	2M	1.2 x 10^-4	OPT - 6.7B
12B	11,327,027,200	36	5120	40	2M	1.2 x 10^-4	—

Engineering details for the Pythia Suite. Deduped and non - deduped models of a given size have the same hyperparameters. “Equivalent” models have exactly the same architecture, and the same number of non - embedding parameters.

Uses and Limitations

Intended Use

The primary intended use of Pythia is research on the behavior, functionality, and limitations of large language models. You may also further fine - tune and adapt Pythia - 410M for deployment, as long as your use is in accordance with the Apache 2.0 license.

Out - of - scope use

The Pythia Suite is not intended for deployment. It is English - language only and not suitable for translation or generating text in other languages. Also, it has not been fine - tuned for common downstream contexts.

Limitations and biases

Never rely on Pythia - 410M to produce factually accurate output. This model was trained on the Pile, which may contain offensive text.

Training

Training data

The Pile is a 825GiB general - purpose dataset in English. It was not deduplicated before being used to train Pythia - 410M.

Training procedure

All models were trained on the exact same data, in the exact same order. Each model saw 299,892,736,000 tokens during training. See GitHub for more details.

Evaluations

All 16 Pythia models were evaluated using the [LM Evaluation Harness](https://github.com/EleutherAI/lm - evaluation - harness). You can access the results by model and step at results/json/* in the GitHub repository.

LAMBADA – OpenAI

Physical Interaction: Question Answering (PIQA)

WinoGrande

AI2 Reasoning Challenge—Easy Set

SciQ

Changelog

This section compares differences between previously released Pythia v0 and the current models.

All model sizes are now trained with uniform batch size of 2M tokens.
Added checkpoints at initialization (step 0) and steps {1,2,4,8,16,32,64,128,256,512} in addition to every 1000 training steps.
Flash Attention was used in the new retrained suite.
Rectified a minor inconsistency in the original suite regarding the learning rate schedule.

Naming convention and parameter count

Pythia models were renamed in January 2023. The current naming convention (70M, 160M, etc.) is based on total parameter count.

current Pythia suffix	old suffix	total params	non - embedding params
70M	19M	70,426,624	18,915,328

🔧 Technical Details

The Pythia models are designed with specific hyperparameters and training procedures to ensure consistent and comparable results across different model sizes. All models in the suite are trained on the same data in the same order, allowing for controlled experiments in interpretability research. The use of specific checkpoints and a uniform batch size during training also contributes to the reproducibility and reliability of the models.

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご