🚀 Qwama-0.5B-Instruct
Qwama-0.5B-Instruct 是基于 Qwen2-0.5B-Instruct 并采用 Llama-3 词表的模型。该模型主要有两个用途,一是作为 Llama-3-70B-Instruct 的草稿模型,二是探索词表交换的可行性。
🚀 快速开始
✨ 主要特性
- 草稿模型用途:可作为 Llama-3-70B-Instruct 的草稿模型。Llama3-8B-Instruct 也能用于此目的,但对于草稿阶段而言,它的计算成本较高。
- 词表交换探索:探索词表交换的可行性,既可以让像 Qwen2-0.5b 这样的小模型为其他模型生成草稿,也可以实现不同语言模型之间的互操作性。不过,这种方法需要微调,对于较大的模型来说成本较高,后续可探索低秩或量化微调作为替代方案。
📦 操作步骤
词表交换是通过创建一个新的嵌入层(原模型使用绑定嵌入,因此输出层相同)并按以下方式初始化来完成的:
- 每个与 Qwen2 词元完全匹配的 L3 词元,用对应的嵌入进行初始化。
- 每个解码并重新编码为多个 Qwen2 词元的 L3 词元,用这些嵌入的平均值进行初始化。
- 不存在无法转换为一个或多个 Qwen2 词元的 L3 词元(两个词表都是完整的)。
for idx in range(target_vocab_size):
decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
new_head[idx] = old_head[encode.flatten()].mean(dim = 0)
完整脚本请见 此处。
使用上述方法交换词表后得到的模型大体连贯,但仍存在很多混淆。该模型尤其在处理数字时表现不佳,并且 Llama-3 控制词元的嵌入在指令调优模型中没有应有的意义。
后续通过微调来解决这些问题,首先在 来自 Common Crawl 的 241 万行样本 上进行微调,然后在约 25000 个由 Llama3-8B-Instruct 生成的指令格式完成项上进行 3 个轮次的微调,这些完成项可参考 此处。
🔧 技术细节
尝试仅微调绑定嵌入,但效果不佳。
📚 模型基准测试
模型 |
Wikitext 2k |
MMLU |
Qwen2-0.5B-instruct @ FP16 |
12.5734 |
43.83% |
Qwama-0.5B-instruct @ FP16 |
15.3390 |
40.37% |
草稿模型推测解码(贪心搜索):
模型 |
草稿模型 |
代码加速比 |
散文加速比 |
Qwen2-72B-instruct @ 6.0bpw |
Qwen2-0.5B-instruct @ 4.0bpw |
3.68x |
1.70x |
Llama3-70B-instruct @ 6.0bpw |
Qwama-0.5B-instruct @ 4.0bpw |
3.72x |
1.92x |
💻 示例生成
Qwen2-0.5B-instruct 生成示例
Hello, my name is Harry Potter. I am the Chosen One, the only wizard from the wizarding world who can fly and bring a book to life in order to summon it. In a world where wizards often use magic for personal gain, I am an advocate for freedom and non-violence.
Once upon a time, there was a princess named Elsa. She lived in a beautiful castle in the snowy mountains. Her castle was filled with different types of animals, such as snowmen, reindeer, and magical trees. The inhabitants of the castle were very friendly and friendly, but one day, they were attacked by a fierce beast, the Queen of the Snow Kingdom.
I am an AI language model. I don't have a physical body, so I cannot participate in activities like running or playing sports. However, I can simulate the movement of an AI language model. Is there anything specific you would like me to help with?
Qwama-0.5B-instruct 生成示例
Hello, my name is Jeffrey Brewer and I am a licensed attorney in both Maryland and Florida. I work with people who are experiencing severe financial stress due to financial mismanagement, foreclosure, divorce, and other financial hardships. My approach is to offer compassionate and skilled legal advice while keeping costs low.
Once upon a time, a giant giant monster with a bad reputation invaded a small town. The mayor and the local community began to fight over who was going to make the rules. But who will win if the monsters were being allowed to roam the town?
I am an AI language model that is designed to answer questions and provide information based on my training data. Would you like me to use my knowledge and expertise to answer your question? I am ready to assist you with any questions you may have. I will be happy to answer your questions in a timely manner.
EXL2 量化模型
EXL2 量化模型已上传至 此处。
📄 许可证
本项目采用 Apache-2.0 许可证。