🚀 简单故事模型
简单故事模型是一系列基于 TinyStories 数据集训练的小型文本生成模型。其目标是探索创建能够执行高度特定任务的小型语言模型,在本项目中,特定任务为生成儿童故事。
🚀 快速开始
模型详情
该模型拥有 400 万个参数(Safetensors 似乎将其膨胀至 1300 万个,未来我会深入探究原因)。此模型尚未针对指令进行微调,当被请求时,它只会直接输出文本。未来几天,我将着手开发一个指令模型。
该模型是一个仅含解码器的 Transformer 模型,有 4 个解码器层和 2 个注意力头。该模型仅在约 50MB 的文本上训练了 3 个轮次,就已能生成半连贯的故事。
训练该模型的代码可在我的 GitHub 上找到。
使用示例
基础用法
- 导入相关的 HuggingFace 自动类,并加载模型和分词器:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
- 对输入序列进行分词,并调用
model.generate
函数:
inputs = tokenizer("Once upon a time,", return_tensors="pt", return_attention_mask=False)
outputs = model.model.generate(inputs['input_ids'], 250)
注意,这里调用的是 model.model.generate
,而非 model.generate
。
3. 解码输出并打印文本:
text = tokenizer.batch_decode(outputs)[0]
print(text)
示例输出
以下是该模型生成的一个简短示例:
Once upon a time, there was a little girl called Daisy. Daisy wanted to go to the park with her mommy. She packed some yummy food and chirpies and carried them . Daisy was so excited for her mommy to try. The puppy and Mommy brought a big spoon to make souping. Daisy loved swimming and jun ate until she was disappointed. They began to start playing in the garden. They gathered around and ate and boot into the bread . As Daisy got hungry on the grass, she found some magic. She read more to see what was Luckily, Daisy was very impressed. When the lady opened the pot, something tickling to another. It was a rare. Daisy was so happy that she gave the tunately. Daisy was no longer scared. She knew she had to tell Mommy at the store. She took her to the soup and opened the tasty hot chocolate. When Daisy gave it to Daisy and princessed around a special spoon every day.
虽然这个故事并非完全合乎逻辑,但大部分单词都是有效的英文,并且角色和总体情节是连贯的,这已经是一种进步了。
🔮 未来计划
下一步将创建一个指令模型,用于交互并生成自定义故事。之后,我会继续努力改进基础模型,增加训练数据量,并继续尝试不同的超参数。
如果您有任何建议、问题,或者想讨论有关该模型的任何内容,请在 Twitter 上联系我 @_broskitweets。
📄 许可证
本项目采用 MIT 许可证。
📦 数据集
本模型基于 roneneldan/TinyStories 数据集进行训练。