🚀 TinyStories-656K
這是一個在TinyStoriesV2數據集上從頭開始訓練的語言模型。目標是打造一個僅用600k左右參數就能生成故事的Transformer語言模型。
🚀 快速開始
本項目旨在構建一個輕量級的Transformer語言模型,通過在TinyStoriesV2數據集上進行訓練,實現故事生成功能。你可以通過以下鏈接獲取項目代碼:Here
✨ 主要特性
- 架構方面:採用Llama架構。
- 技術應用:運用GQA(Grouped Query Attention)技術。
- 參數設置:隱藏層大小為128;使用
tie_word_embeddings
;詞彙表大小為2048(在TinystoriesV2上使用BPE從頭開始訓練);包含2個Transformer層。
📦 安裝指南
暫未提供相關安裝步驟。
💻 使用示例
基礎用法
以下是完整的訓練參數設置代碼:
training_args = TrainingArguments(
do_train=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=0.004629403549377777,
lr_scheduler_type="constant",
bf16=True,
logging_steps=5,
num_train_epochs=2,
save_steps=10000000,
seed=3407,report_to=None
)
高級用法
生成模板
<|start_story|>Once upon a time,
生成示例
Once upon a time, there was a little boy named Tim. Tim had a toy car that he loved to play with. One day, he went to the park with his mom. Tim saw a toy car on the ground. Tim wanted to play with the car to his mom and said, "Mom, can I play with your car with my car too?"
His mom said, "Yes, but we must not take turns." Tim felt sad, but he knew he had to go. He asked his mom for help. His mom said, "Okay, let's clean it together." They went to play together and played the toy car. They had a lot of fun.
After they finished the car together, Tim and his mom were surprised. They did not know that the car was not a toy car like it was a magic car. Tim had an idea. He put the car in the car and put the car on it. He pushed the car on the car on the car car and pulled it down. Tim was so happy. He played with the car with his car all day long, and Tim was very happy.<|end_story|>
推薦生成配置
do_sample=True,
top_k=40,
top_p=0.9,
temperature=0.6
🔧 技術細節
該模型在TinyStoriesV2數據集上從頭開始訓練,致力於以較少的參數(約600k)實現故事生成功能。採用Llama架構和GQA技術,隱藏層大小為128,使用tie_word_embeddings
,詞彙表大小為2048,通過BPE在TinystoriesV2上進行訓練,包含2個Transformer層。訓練過程中使用了特定的訓練參數,如學習率、批量大小等,以確保模型的性能和穩定性。
📄 許可證
本項目採用Apache-2.0許可證。