TinyStories-656K Open-Source Story Generation Model - Free Story Creation with Lightweight Parameters

Tinystories 656K

Developed by raincandy-u

TinyStories-656K is a lightweight Transformer language model focused on story generation, using only about 600k parameters.

Text Generation

Transformers

EnglishOpen Source License:Apache-2.0 #Lightweight story generation #Llama architecture #GQA attention

Downloads 119

Release Time : 6/12/2024

Model Overview

This model is trained from scratch on the TinyStoriesV2 dataset, adopting the Llama architecture and GQA technology, aiming to achieve story generation with fewer parameters.

Model Features

Lightweight design

Using only about 600k parameters, suitable for environments with limited resources.

Efficient training

Trained from scratch on the TinyStoriesV2 dataset, using a BPE vocabulary.

Advanced architecture

Adopting the Llama architecture and GQA technology to improve model performance.

Model Capabilities

Story generation

Text continuation

Use Cases

Education

Children's story generation

Generate short stories suitable for children to read.

The generated stories are logical and interesting, suitable for children to read.

Entertainment

Creative writing assistance

Assist users in creative writing by providing story beginnings or continuations.

The generated text is coherent and creative.

🚀 TinyStories-656K

This is a language model (LM) trained from scratch on the TinyStoriesV2 dataset. It aims to be a transformer-based language model capable of generating stories with only around 600k parameters.

✨ Features

Llama Architecture: Adopts the Llama architecture for efficient and effective language processing.
GQA: Utilizes Grouped Query Attention (GQA) to enhance performance.
Hidden Size: The hidden size is set to 128, balancing model complexity and performance.
Tie Word Embeddings: Employs the tie_word_embeddings technique to optimize resource usage.
Vocabulary Size: With a vocab_size of 2048, it is trained on TinystoriesV2 from scratch using Byte Pair Encoding (BPE).
Transformer Layers: Consists of 2 Transformer layers.

Code: Here

📦 Installation

No installation steps were provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The template for story generation is as follows:

<|start_story|>Once upon a time,

Advanced Usage

Here is an example of the generated story:

Once upon a time, there was a little boy named Tim. Tim had a toy car that he loved to play with. One day, he went to the park with his mom. Tim saw a toy car on the ground. Tim wanted to play with the car to his mom and said, "Mom, can I play with your car with my car too?"
His mom said, "Yes, but we must not take turns." Tim felt sad, but he knew he had to go. He asked his mom for help. His mom said, "Okay, let's clean it together." They went to play together and played the toy car. They had a lot of fun.
After they finished the car together, Tim and his mom were surprised. They did not know that the car was not a toy car like it was a magic car. Tim had an idea. He put the car in the car and put the car on it. He pushed the car on the car on the car car and pulled it down. Tim was so happy. He played with the car with his car all day long, and Tim was very happy.<|end_story|>

The recommended generation configuration is:

do_sample=True,
top_k=40,
top_p=0.9,
temperature=0.6

🔧 Technical Details

Full Training Arguments

training_args = TrainingArguments(
  do_train=True,
  per_device_train_batch_size=16,
  gradient_accumulation_steps=1,
  learning_rate=0.004629403549377777,
  lr_scheduler_type="constant",
  bf16=True,
  logging_steps=5,
  num_train_epochs=2,
  save_steps=10000000,
  seed=3407,
  report_to=None
)

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご