smol_llama-220M-GQA开源文本生成模型 - 免费实现高效文本创作

首页

Smol Llama 220M GQA

由 BEE-spoke-data 开发

smol_llama是一个220M参数的小型解码器模型，具备GQA（分组查询注意力）机制，适用于文本生成等任务。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #轻量级文本生成 #GQA注意力机制 #指令微调适配

下载量 3,633

发布时间 : 12/22/2023

模型简介

这是一个小型解码器模型，总参数为220M，是该模型的首个版本。它具备一定的性能和特性，可用于文本生成等任务。

模型特点

小型高效

220M参数的轻量级模型，可在单GPU上从头开始训练

GQA机制

采用分组查询注意力（32个注意力头，8个键值对），提高推理效率

长上下文支持

支持2048 tokens的上下文长度

多样化微调

提供多种微调版本，包括指令微调、代码生成等

模型能力

文本生成

指令跟随

代码生成

问答系统

使用案例

通用文本生成

故事续写

根据给定的开头续写故事

如示例中的'Story Continuation'所示，模型能连贯地续写故事

知识问答

回答基于事实的问题

如示例中的'Photosynthesis'所示，模型能提供基本正确的知识回答

教育

数学问题解答

解决基础数学问题

如示例中的'Math Problem'所示，模型能理解并尝试解答数学问题

娱乐

谜语解答

解答谜语和脑筋急转弯

如示例中的'Riddle'所示，模型能理解并尝试解答谜语

🚀 smol_llama: 220M GQA

这是一个小型的解码器模型，总参数为220M，是该模型的首个版本。它具备一定的性能和特性，可用于文本生成等任务。

🚀 快速开始

模型特性

隐藏层大小为1024，共10层。
采用GQA（32个注意力头，8个键值对），上下文长度为2048。
可在单GPU上从头开始训练 😊

模型链接

这里有一些我们进行的微调模型，但实际上还有更多的可能性！

指令微调模型
- openhermes - 链接
- open-instruct - 链接
代码相关模型
- Python（PyPI） - 链接
Zephyr DPO微调模型
- SFT - 链接
- 全DPO - 链接

📚 详细文档

Open LLM Leaderboard评估结果

详细结果可查看这里

指标	值
平均值	29.44
AI2推理挑战（25次少样本学习）	24.83
HellaSwag（10次少样本学习）	29.76
MMLU（5次少样本学习）	25.85
TruthfulQA（0次少样本学习）	44.55
Winogrande（5次少样本学习）	50.99
GSM8k（5次少样本学习）	0.68

Open LLM Leaderboard评估结果

详细结果可查看这里

指标	值
平均值	6.62
IFEval（0次少样本学习）	23.86
BBH（3次少样本学习）	3.04
MATH Lvl 5（4次少样本学习）	0.00
GPQA（0次少样本学习）	0.78
MuSR（0次少样本学习）	9.07
MMLU - PRO（5次少样本学习）	1.66

📄 许可证

本项目采用Apache 2.0许可证。

📦 数据集

JeanKaddour/minipile
pszemraj/simple_wikipedia_LM
mattymchen/refinedweb - 3m
BEE - spoke - data/knowledge - inoc - concat - v1

💻 推理参数

{
    "parameters": {
        "max_new_tokens": 64,
        "do_sample": true,
        "temperature": 0.8,
        "repetition_penalty": 1.05,
        "no_repeat_ngram_size": 4,
        "eta_cutoff": 0.0006,
        "renormalize_logits": true
    }
}

📋 推理示例

示例标题	输入文本
El Microondas	My name is El Microondas the Wise, and
Kennesaw State University	Kennesaw State University is a public
Bungie	Bungie Studios is an American video game developer. They are most famous for developing the award winning Halo series of video games. They also made Destiny. The studio was founded
Mona Lisa	The Mona Lisa is a world - renowned painting created by
Harry Potter Series	The Harry Potter series, written by J.K. Rowling, begins with the book titled
Riddle	'Question: I have cities, but no houses. I have mountains, but no trees. I have water, but no fish. What am I? Answer:'
Photosynthesis	The process of photosynthesis involves the conversion of
Story Continuation	Jane went to the store to buy some groceries. She picked up apples, oranges, and a loaf of bread. When she got home, she realized she forgot
Math Problem	'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and another train leaves Station B at 10:00 AM and travels at 80 mph, when will they meet if the distance between the stations is 300 miles? To determine'
Algorithm Definition	In the context of computer programming, an algorithm is