Qwama-0.5B-Instruct開源指令模型 - 可作Llama-3-70B草稿生成器

首頁

Qwama 0.5B Instruct

由turboderp開發

基於Qwen2-0.5B指令模型改造，採用Llama-3詞表的0.5B參數指令模型，主要作為Llama-3-70B的草稿生成器

大型語言模型

Transformers

開源協議:Apache-2.0 #草稿生成器 #詞表替換 #指令微調

下載量 2,822

發布時間 : 6/13/2024

模型概述

這是一個通過詞表替換技術將Qwen2-0.5B指令模型轉換為使用Llama-3詞表的模型，主要用於為Llama-3-70B指令模型生成草稿內容，同時探索詞表替換的可行性

模型特點

詞表替換技術

通過創新的詞表替換方法，將Qwen2模型轉換為使用Llama-3詞表，保持模型功能的同時實現詞表兼容性

高效草稿生成

專門優化作為大語言模型的草稿生成器，相比直接使用Llama3-8B更節省計算資源

兩階段微調

經過Common Crawl數據和Llama3生成指令數據的精細微調，顯著提升生成質量

模型能力

文本生成

指令跟隨

草稿內容生成

多輪對話

使用案例

大模型輔助

Llama3-70B的草稿生成器

為Llama3-70B等大模型生成初步草稿內容，提高大模型推理效率

在代碼生成任務中實現3.72倍加速，在散文生成中實現1.92倍加速

技術驗證

詞表替換可行性驗證

驗證不同語言模型間詞表替換的技術可行性

證實該方法有效，但需要微調來保證生成質量

🚀 Qwama-0.5B-Instruct

Qwama-0.5B-Instruct 是基於 Qwen2-0.5B-Instruct 並採用 Llama-3 詞表的模型。該模型主要有兩個用途，一是作為 Llama-3-70B-Instruct 的草稿模型，二是探索詞表交換的可行性。

🚀 快速開始

✨ 主要特性

草稿模型用途：可作為 Llama-3-70B-Instruct 的草稿模型。Llama3-8B-Instruct 也能用於此目的，但對於草稿階段而言，它的計算成本較高。
詞表交換探索：探索詞表交換的可行性，既可以讓像 Qwen2-0.5b 這樣的小模型為其他模型生成草稿，也可以實現不同語言模型之間的互操作性。不過，這種方法需要微調，對於較大的模型來說成本較高，後續可探索低秩或量化微調作為替代方案。

📦 操作步驟

詞表交換是通過創建一個新的嵌入層（原模型使用綁定嵌入，因此輸出層相同）並按以下方式初始化來完成的：

每個與 Qwen2 詞元完全匹配的 L3 詞元，用對應的嵌入進行初始化。
每個解碼並重新編碼為多個 Qwen2 詞元的 L3 詞元，用這些嵌入的平均值進行初始化。
不存在無法轉換為一個或多個 Qwen2 詞元的 L3 詞元（兩個詞表都是完整的）。

for idx in range(target_vocab_size):
    decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
    encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
    new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
    new_head[idx] = old_head[encode.flatten()].mean(dim = 0)

完整腳本請見此處。

使用上述方法交換詞表後得到的模型大體連貫，但仍存在很多混淆。該模型尤其在處理數字時表現不佳，並且 Llama-3 控制詞元的嵌入在指令調優模型中沒有應有的意義。

後續通過微調來解決這些問題，首先在來自 Common Crawl 的 241 萬行樣本上進行微調，然後在約 25000 個由 Llama3-8B-Instruct 生成的指令格式完成項上進行 3 個輪次的微調，這些完成項可參考此處。

🔧 技術細節

嘗試僅微調綁定嵌入，但效果不佳。

📚 模型基準測試

模型	Wikitext 2k	MMLU
Qwen2-0.5B-instruct @ FP16	12.5734	43.83%
Qwama-0.5B-instruct @ FP16	15.3390	40.37%

草稿模型推測解碼（貪心搜索）：

模型	草稿模型	代碼加速比	散文加速比
Qwen2-72B-instruct @ 6.0bpw	Qwen2-0.5B-instruct @ 4.0bpw	3.68x	1.70x
Llama3-70B-instruct @ 6.0bpw	Qwama-0.5B-instruct @ 4.0bpw	3.72x	1.92x

💻 示例生成

Qwen2-0.5B-instruct 生成示例

Hello, my name is Harry Potter. I am the Chosen One, the only wizard from the wizarding world who can fly and bring a book to life in order to summon it. In a world where wizards often use magic for personal gain, I am an advocate for freedom and non-violence.

Once upon a time, there was a princess named Elsa. She lived in a beautiful castle in the snowy mountains. Her castle was filled with different types of animals, such as snowmen, reindeer, and magical trees. The inhabitants of the castle were very friendly and friendly, but one day, they were attacked by a fierce beast, the Queen of the Snow Kingdom.

I am an AI language model. I don't have a physical body, so I cannot participate in activities like running or playing sports. However, I can simulate the movement of an AI language model. Is there anything specific you would like me to help with?

Qwama-0.5B-instruct 生成示例

Hello, my name is Jeffrey Brewer and I am a licensed attorney in both Maryland and Florida. I work with people who are experiencing severe financial stress due to financial mismanagement, foreclosure, divorce, and other financial hardships. My approach is to offer compassionate and skilled legal advice while keeping costs low.

Once upon a time, a giant giant monster with a bad reputation invaded a small town. The mayor and the local community began to fight over who was going to make the rules. But who will win if the monsters were being allowed to roam the town?

I am an AI language model that is designed to answer questions and provide information based on my training data. Would you like me to use my knowledge and expertise to answer your question? I am ready to assist you with any questions you may have. I will be happy to answer your questions in a timely manner.