🚀 Qwama-0.5B-Instruct
Qwama-0.5B-Instruct 是基於 Qwen2-0.5B-Instruct 並採用 Llama-3 詞表的模型。該模型主要有兩個用途,一是作為 Llama-3-70B-Instruct 的草稿模型,二是探索詞表交換的可行性。
🚀 快速開始
✨ 主要特性
- 草稿模型用途:可作為 Llama-3-70B-Instruct 的草稿模型。Llama3-8B-Instruct 也能用於此目的,但對於草稿階段而言,它的計算成本較高。
- 詞表交換探索:探索詞表交換的可行性,既可以讓像 Qwen2-0.5b 這樣的小模型為其他模型生成草稿,也可以實現不同語言模型之間的互操作性。不過,這種方法需要微調,對於較大的模型來說成本較高,後續可探索低秩或量化微調作為替代方案。
📦 操作步驟
詞表交換是通過創建一個新的嵌入層(原模型使用綁定嵌入,因此輸出層相同)並按以下方式初始化來完成的:
- 每個與 Qwen2 詞元完全匹配的 L3 詞元,用對應的嵌入進行初始化。
- 每個解碼並重新編碼為多個 Qwen2 詞元的 L3 詞元,用這些嵌入的平均值進行初始化。
- 不存在無法轉換為一個或多個 Qwen2 詞元的 L3 詞元(兩個詞表都是完整的)。
for idx in range(target_vocab_size):
decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
new_head[idx] = old_head[encode.flatten()].mean(dim = 0)
完整腳本請見 此處。
使用上述方法交換詞表後得到的模型大體連貫,但仍存在很多混淆。該模型尤其在處理數字時表現不佳,並且 Llama-3 控制詞元的嵌入在指令調優模型中沒有應有的意義。
後續通過微調來解決這些問題,首先在 來自 Common Crawl 的 241 萬行樣本 上進行微調,然後在約 25000 個由 Llama3-8B-Instruct 生成的指令格式完成項上進行 3 個輪次的微調,這些完成項可參考 此處。
🔧 技術細節
嘗試僅微調綁定嵌入,但效果不佳。
📚 模型基準測試
模型 |
Wikitext 2k |
MMLU |
Qwen2-0.5B-instruct @ FP16 |
12.5734 |
43.83% |
Qwama-0.5B-instruct @ FP16 |
15.3390 |
40.37% |
草稿模型推測解碼(貪心搜索):
模型 |
草稿模型 |
代碼加速比 |
散文加速比 |
Qwen2-72B-instruct @ 6.0bpw |
Qwen2-0.5B-instruct @ 4.0bpw |
3.68x |
1.70x |
Llama3-70B-instruct @ 6.0bpw |
Qwama-0.5B-instruct @ 4.0bpw |
3.72x |
1.92x |
💻 示例生成
Qwen2-0.5B-instruct 生成示例
Hello, my name is Harry Potter. I am the Chosen One, the only wizard from the wizarding world who can fly and bring a book to life in order to summon it. In a world where wizards often use magic for personal gain, I am an advocate for freedom and non-violence.
Once upon a time, there was a princess named Elsa. She lived in a beautiful castle in the snowy mountains. Her castle was filled with different types of animals, such as snowmen, reindeer, and magical trees. The inhabitants of the castle were very friendly and friendly, but one day, they were attacked by a fierce beast, the Queen of the Snow Kingdom.
I am an AI language model. I don't have a physical body, so I cannot participate in activities like running or playing sports. However, I can simulate the movement of an AI language model. Is there anything specific you would like me to help with?
Qwama-0.5B-instruct 生成示例
Hello, my name is Jeffrey Brewer and I am a licensed attorney in both Maryland and Florida. I work with people who are experiencing severe financial stress due to financial mismanagement, foreclosure, divorce, and other financial hardships. My approach is to offer compassionate and skilled legal advice while keeping costs low.
Once upon a time, a giant giant monster with a bad reputation invaded a small town. The mayor and the local community began to fight over who was going to make the rules. But who will win if the monsters were being allowed to roam the town?
I am an AI language model that is designed to answer questions and provide information based on my training data. Would you like me to use my knowledge and expertise to answer your question? I am ready to assist you with any questions you may have. I will be happy to answer your questions in a timely manner.
EXL2 量化模型
EXL2 量化模型已上傳至 此處。
📄 許可證
本項目採用 Apache-2.0 許可證。