🚀 簡單故事模型
簡單故事模型是一系列基於 TinyStories 數據集訓練的小型文本生成模型。其目標是探索創建能夠執行高度特定任務的小型語言模型,在本項目中,特定任務為生成兒童故事。
🚀 快速開始
模型詳情
該模型擁有 400 萬個參數(Safetensors 似乎將其膨脹至 1300 萬個,未來我會深入探究原因)。此模型尚未針對指令進行微調,當被請求時,它只會直接輸出文本。未來幾天,我將著手開發一個指令模型。
該模型是一個僅含解碼器的 Transformer 模型,有 4 個解碼器層和 2 個注意力頭。該模型僅在約 50MB 的文本上訓練了 3 個輪次,就已能生成半連貫的故事。
訓練該模型的代碼可在我的 GitHub 上找到。
使用示例
基礎用法
- 導入相關的 HuggingFace 自動類,並加載模型和分詞器:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
- 對輸入序列進行分詞,並調用
model.generate
函數:
inputs = tokenizer("Once upon a time,", return_tensors="pt", return_attention_mask=False)
outputs = model.model.generate(inputs['input_ids'], 250)
注意,這裡調用的是 model.model.generate
,而非 model.generate
。
3. 解碼輸出並打印文本:
text = tokenizer.batch_decode(outputs)[0]
print(text)
示例輸出
以下是該模型生成的一個簡短示例:
Once upon a time, there was a little girl called Daisy. Daisy wanted to go to the park with her mommy. She packed some yummy food and chirpies and carried them . Daisy was so excited for her mommy to try. The puppy and Mommy brought a big spoon to make souping. Daisy loved swimming and jun ate until she was disappointed. They began to start playing in the garden. They gathered around and ate and boot into the bread . As Daisy got hungry on the grass, she found some magic. She read more to see what was Luckily, Daisy was very impressed. When the lady opened the pot, something tickling to another. It was a rare. Daisy was so happy that she gave the tunately. Daisy was no longer scared. She knew she had to tell Mommy at the store. She took her to the soup and opened the tasty hot chocolate. When Daisy gave it to Daisy and princessed around a special spoon every day.
雖然這個故事並非完全合乎邏輯,但大部分單詞都是有效的英文,並且角色和總體情節是連貫的,這已經是一種進步了。
🔮 未來計劃
下一步將創建一個指令模型,用於交互並生成自定義故事。之後,我會繼續努力改進基礎模型,增加訓練數據量,並繼續嘗試不同的超參數。
如果您有任何建議、問題,或者想討論有關該模型的任何內容,請在 Twitter 上聯繫我 @_broskitweets。
📄 許可證
本項目採用 MIT 許可證。
📦 數據集
本模型基於 roneneldan/TinyStories 數據集進行訓練。