🚀 TinyStories-656K
This is a language model (LM) trained from scratch on the TinyStoriesV2 dataset. It aims to be a transformer-based language model capable of generating stories with only around 600k parameters.
✨ Features
- Llama Architecture: Adopts the Llama architecture for efficient and effective language processing.
- GQA: Utilizes Grouped Query Attention (GQA) to enhance performance.
- Hidden Size: The hidden size is set to 128, balancing model complexity and performance.
- Tie Word Embeddings: Employs the
tie_word_embeddings
technique to optimize resource usage.
- Vocabulary Size: With a
vocab_size
of 2048, it is trained on TinystoriesV2 from scratch using Byte Pair Encoding (BPE).
- Transformer Layers: Consists of 2 Transformer layers.
Code: Here
📦 Installation
No installation steps were provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
The template for story generation is as follows:
<|start_story|>Once upon a time,
Advanced Usage
Here is an example of the generated story:
Once upon a time, there was a little boy named Tim. Tim had a toy car that he loved to play with. One day, he went to the park with his mom. Tim saw a toy car on the ground. Tim wanted to play with the car to his mom and said, "Mom, can I play with your car with my car too?"
His mom said, "Yes, but we must not take turns." Tim felt sad, but he knew he had to go. He asked his mom for help. His mom said, "Okay, let's clean it together." They went to play together and played the toy car. They had a lot of fun.
After they finished the car together, Tim and his mom were surprised. They did not know that the car was not a toy car like it was a magic car. Tim had an idea. He put the car in the car and put the car on it. He pushed the car on the car on the car car and pulled it down. Tim was so happy. He played with the car with his car all day long, and Tim was very happy.<|end_story|>
The recommended generation configuration is:
do_sample=True,
top_k=40,
top_p=0.9,
temperature=0.6
🔧 Technical Details
Full Training Arguments
training_args = TrainingArguments(
do_train=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=0.004629403549377777,
lr_scheduler_type="constant",
bf16=True,
logging_steps=5,
num_train_epochs=2,
save_steps=10000000,
seed=3407,
report_to=None
)
📄 License
This project is licensed under the Apache-2.0 license.