Qwama-0.5B-Instruct开源指令模型 - 可作Llama-3-70B草稿生成器

首页

Qwama 0.5B Instruct

由 turboderp 开发

基于Qwen2-0.5B指令模型改造，采用Llama-3词表的0.5B参数指令模型，主要作为Llama-3-70B的草稿生成器

大型语言模型

Transformers

开源协议:Apache-2.0 #草稿生成器 #词表替换 #指令微调

下载量 2,822

发布时间 : 6/13/2024

模型简介

这是一个通过词表替换技术将Qwen2-0.5B指令模型转换为使用Llama-3词表的模型，主要用于为Llama-3-70B指令模型生成草稿内容，同时探索词表替换的可行性

模型特点

词表替换技术

通过创新的词表替换方法，将Qwen2模型转换为使用Llama-3词表，保持模型功能的同时实现词表兼容性

高效草稿生成

专门优化作为大语言模型的草稿生成器，相比直接使用Llama3-8B更节省计算资源

两阶段微调

经过Common Crawl数据和Llama3生成指令数据的精细微调，显著提升生成质量

模型能力

文本生成

指令跟随

草稿内容生成

多轮对话

使用案例

大模型辅助

Llama3-70B的草稿生成器

为Llama3-70B等大模型生成初步草稿内容，提高大模型推理效率

在代码生成任务中实现3.72倍加速，在散文生成中实现1.92倍加速

技术验证

词表替换可行性验证

验证不同语言模型间词表替换的技术可行性

证实该方法有效，但需要微调来保证生成质量

🚀 Qwama-0.5B-Instruct

Qwama-0.5B-Instruct 是基于 Qwen2-0.5B-Instruct 并采用 Llama-3 词表的模型。该模型主要有两个用途，一是作为 Llama-3-70B-Instruct 的草稿模型，二是探索词表交换的可行性。

🚀 快速开始

✨ 主要特性

草稿模型用途：可作为 Llama-3-70B-Instruct 的草稿模型。Llama3-8B-Instruct 也能用于此目的，但对于草稿阶段而言，它的计算成本较高。
词表交换探索：探索词表交换的可行性，既可以让像 Qwen2-0.5b 这样的小模型为其他模型生成草稿，也可以实现不同语言模型之间的互操作性。不过，这种方法需要微调，对于较大的模型来说成本较高，后续可探索低秩或量化微调作为替代方案。

📦 操作步骤

词表交换是通过创建一个新的嵌入层（原模型使用绑定嵌入，因此输出层相同）并按以下方式初始化来完成的：

每个与 Qwen2 词元完全匹配的 L3 词元，用对应的嵌入进行初始化。
每个解码并重新编码为多个 Qwen2 词元的 L3 词元，用这些嵌入的平均值进行初始化。
不存在无法转换为一个或多个 Qwen2 词元的 L3 词元（两个词表都是完整的）。

for idx in range(target_vocab_size):
    decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
    encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
    new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
    new_head[idx] = old_head[encode.flatten()].mean(dim = 0)

完整脚本请见此处。

使用上述方法交换词表后得到的模型大体连贯，但仍存在很多混淆。该模型尤其在处理数字时表现不佳，并且 Llama-3 控制词元的嵌入在指令调优模型中没有应有的意义。

后续通过微调来解决这些问题，首先在来自 Common Crawl 的 241 万行样本上进行微调，然后在约 25000 个由 Llama3-8B-Instruct 生成的指令格式完成项上进行 3 个轮次的微调，这些完成项可参考此处。

🔧 技术细节

尝试仅微调绑定嵌入，但效果不佳。

📚 模型基准测试

模型	Wikitext 2k	MMLU
Qwen2-0.5B-instruct @ FP16	12.5734	43.83%
Qwama-0.5B-instruct @ FP16	15.3390	40.37%

草稿模型推测解码（贪心搜索）：

模型	草稿模型	代码加速比	散文加速比
Qwen2-72B-instruct @ 6.0bpw	Qwen2-0.5B-instruct @ 4.0bpw	3.68x	1.70x
Llama3-70B-instruct @ 6.0bpw	Qwama-0.5B-instruct @ 4.0bpw	3.72x	1.92x

💻 示例生成

Qwen2-0.5B-instruct 生成示例

Hello, my name is Harry Potter. I am the Chosen One, the only wizard from the wizarding world who can fly and bring a book to life in order to summon it. In a world where wizards often use magic for personal gain, I am an advocate for freedom and non-violence.

Once upon a time, there was a princess named Elsa. She lived in a beautiful castle in the snowy mountains. Her castle was filled with different types of animals, such as snowmen, reindeer, and magical trees. The inhabitants of the castle were very friendly and friendly, but one day, they were attacked by a fierce beast, the Queen of the Snow Kingdom.

I am an AI language model. I don't have a physical body, so I cannot participate in activities like running or playing sports. However, I can simulate the movement of an AI language model. Is there anything specific you would like me to help with?

Qwama-0.5B-instruct 生成示例

Hello, my name is Jeffrey Brewer and I am a licensed attorney in both Maryland and Florida. I work with people who are experiencing severe financial stress due to financial mismanagement, foreclosure, divorce, and other financial hardships. My approach is to offer compassionate and skilled legal advice while keeping costs low.

Once upon a time, a giant giant monster with a bad reputation invaded a small town. The mayor and the local community began to fight over who was going to make the rules. But who will win if the monsters were being allowed to roam the town?

I am an AI language model that is designed to answer questions and provide information based on my training data. Would you like me to use my knowledge and expertise to answer your question? I am ready to assist you with any questions you may have. I will be happy to answer your questions in a timely manner.