Tinystories-gpt2-3M Open Source Text Generation Model - Free to Use, Ensuring Coherence in Text Generation

Tinystories Gpt2 3M

Developed by calum

This is a small GPT-2 model pre-trained on the TinyStories V2 dataset, featuring 3M trainable parameters and demonstrating good text generation coherence.

Large Language Model

Transformers

English#Small-parameter GPT-2 #Children's story generation #English text generation

Downloads 637

Release Time : 10/9/2023

Model Overview

This model is a small-scale language model based on the GPT-2 architecture, specifically designed for research purposes, showcasing surprising text generation capabilities within a limited vocabulary.

Model Features

Compact and Efficient

A small model with only 3M parameters that performs well under limited resources

Coherent Generation

Demonstrates remarkable text coherence given its size

Research-Friendly

Uses the widely supported GPT-2 architecture, facilitating research experiments

Model Capabilities

English text generation

Short story creation

Coherent expression within limited vocabulary

Use Cases

Educational Research

Small Language Model Research

Used to study the performance of language models with limited parameters

Observable generation capabilities of small models in specific domains

Text Generation

Simple Story Creation

Generates short stories suitable for children's reading

Can produce coherent stories within the trained vocabulary range

🚀 TinyStories-GPT2-3M

This is a tiny (3M trainable parameters) GPT - 2 model pre - trained for 3 epochs on the TinyStories V2 dataset, aiming to support relevant research.

🚀 Quick Start

This model is a pre - trained GPT - 2 model. To replicate the training process, you can follow the steps in the "Training procedure" section.

✨ Features

Compact Design: With only 3M trainable parameters, it is a very tiny model.
GPT - 2 Architecture: Using the GPT - 2 architecture, which is more widely supported across tooling, accelerating research.
Surprising Coherency: Despite its small size, it shows a certain degree of coherency in text generation.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Model description

TinyStories - GPT2 - 3M is a replication of the TinyStories model, using a GPT - 2 architecture in place of GPT - Neo. This was a deliberate choice made to accelerate research, as the GPT - 2 architecture is more widely supported across tooling. We do not contribute any performance improvements of note, though similarly to the original model, we find a surprising degree of coherency within the model, given its size.

Intended uses & limitations

Research use only - NOT suitable for commercial use per OpenAI TOS on using their APIs to source training data.

Note that the vocabulary this model was trained on is quite minimal. Out of distribution inputs will not work as well as a larger, more general purpose model. To observe this behaviour, try generating a few tokens after a non - trivial word like "Biology". The model typically treats words that did not frequently appear in training as character names in a story.

All training data is English. As such, input with other languages is out of distribution, and will result in the model treating previous input as character names, ignoring it entirely, or generating meaningless tokens.

Training and evaluation data

Trained for 3 epochs on the TinyStories V2 dataset, produced by GPT - 4.

Training procedure

Trained for 400k steps (~7 hours) on 2xH100 80GB PCIe with 32vCPU and 500GB RAM on Runpod.

To replicate, download GPT - 4 V2 version of the TinyStories dataset alongside HuggingFace's train_clm.py script. Then run the following:

#! /bin/bash

python train_clm.py \
    --model_type=gpt2 \
    --config_overrides=n_embd=64,n_layer=8,n_head=16 \
    --tokenizer_name=gpt2 \
    --train_file="data/TinyStoriesV2 - GPT4 - train.txt" \
    --validation_file="data/TinyStoriesV2 - GPT4 - valid.txt" \
    --block_size=256 \
    --preprocessing_num_workers=8 \
    --output_dir="out" \
    --logging_dir="./log" \
    --logging_steps=100 \
    --logging_strategy=steps \
    --save_steps=5000 \
    --save_total_limit=10 \
    --do_train

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
n_embd	64
n_layer	8
n_head	16
learning_rate	5e - 05
train_batch_size	16
eval_batch_size	16
seed	42
optimizer	Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type	linear
num_epochs	3.0

Framework versions

Property	Details
Transformers	4.35.0.dev0
Pytorch	2.0.1+cu118
Datasets	2.14.5
Tokenizers	0.14.1

🔧 Technical Details

The model uses the GPT - 2 architecture with specific hyperparameter settings. Training was carried out on a specific hardware configuration (2xH100 80GB PCIe with 32vCPU and 500GB RAM on Runpod) for a certain number of steps and epochs. The choice of GPT - 2 architecture is to take advantage of its wide - spread tooling support for research acceleration.

📄 License

No license information is provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご