🚀 smol_llama: 220M GQA
这是一个小型的解码器模型,总参数为220M,是该模型的首个版本。它具备一定的性能和特性,可用于文本生成等任务。
🚀 快速开始
模型特性
- 隐藏层大小为1024,共10层。
- 采用GQA(32个注意力头,8个键值对),上下文长度为2048。
- 可在单GPU上从头开始训练 😊
模型链接
这里有一些我们进行的微调模型,但实际上还有更多的可能性!
- 指令微调模型
- openhermes - 链接
- open-instruct - 链接
- 代码相关模型
- Zephyr DPO微调模型
📚 详细文档
详细结果可查看这里
指标 |
值 |
平均值 |
29.44 |
AI2推理挑战(25次少样本学习) |
24.83 |
HellaSwag(10次少样本学习) |
29.76 |
MMLU(5次少样本学习) |
25.85 |
TruthfulQA(0次少样本学习) |
44.55 |
Winogrande(5次少样本学习) |
50.99 |
GSM8k(5次少样本学习) |
0.68 |
详细结果可查看这里
指标 |
值 |
平均值 |
6.62 |
IFEval(0次少样本学习) |
23.86 |
BBH(3次少样本学习) |
3.04 |
MATH Lvl 5(4次少样本学习) |
0.00 |
GPQA(0次少样本学习) |
0.78 |
MuSR(0次少样本学习) |
9.07 |
MMLU - PRO(5次少样本学习) |
1.66 |
📄 许可证
本项目采用Apache 2.0许可证。
📦 数据集
- JeanKaddour/minipile
- pszemraj/simple_wikipedia_LM
- mattymchen/refinedweb - 3m
- BEE - spoke - data/knowledge - inoc - concat - v1
💻 推理参数
{
"parameters": {
"max_new_tokens": 64,
"do_sample": true,
"temperature": 0.8,
"repetition_penalty": 1.05,
"no_repeat_ngram_size": 4,
"eta_cutoff": 0.0006,
"renormalize_logits": true
}
}
📋 推理示例
示例标题 |
输入文本 |
El Microondas |
My name is El Microondas the Wise, and |
Kennesaw State University |
Kennesaw State University is a public |
Bungie |
Bungie Studios is an American video game developer. They are most famous for developing the award winning Halo series of video games. They also made Destiny. The studio was founded |
Mona Lisa |
The Mona Lisa is a world - renowned painting created by |
Harry Potter Series |
The Harry Potter series, written by J.K. Rowling, begins with the book titled |
Riddle |
'Question: I have cities, but no houses. I have mountains, but no trees. I have water, but no fish. What am I? Answer:' |
Photosynthesis |
The process of photosynthesis involves the conversion of |
Story Continuation |
Jane went to the store to buy some groceries. She picked up apples, oranges, and a loaf of bread. When she got home, she realized she forgot |
Math Problem |
'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and another train leaves Station B at 10:00 AM and travels at 80 mph, when will they meet if the distance between the stations is 300 miles? To determine' |
Algorithm Definition |
In the context of computer programming, an algorithm is |