Qwama-0.5B-Instruct Open-source Instruction Model - Can Serve as a Draft Generator for Llama-3-70B

Qwama 0.5B Instruct

Developed by turboderp

Modified from Qwen2-0.5B instruction model, a 0.5B parameter instruction model using Llama-3 vocabulary, primarily serving as a draft generator for Llama-3-70B

Large Language Model

Transformers

Open Source License:Apache-2.0 #Draft Generator #Vocabulary Replacement #Instruction Fine-tuning

Downloads 2,822

Release Time : 6/13/2024

Model Overview

This model converts the Qwen2-0.5B instruction model to use Llama-3 vocabulary through vocabulary replacement technology, mainly used to generate draft content for the Llama-3-70B instruction model while exploring the feasibility of vocabulary replacement

Model Features

Vocabulary Replacement Technology

Through innovative vocabulary replacement methods, the Qwen2 model is converted to use Llama-3 vocabulary, maintaining model functionality while achieving vocabulary compatibility

Efficient Draft Generation

Specially optimized as a draft generator for large language models, saving computational resources compared to directly using Llama3-8B

Two-Stage Fine-Tuning

Refined fine-tuning with Common Crawl data and Llama3-generated instruction data significantly improves generation quality

Model Capabilities

Text generation

Instruction following

Draft content generation

Multi-turn dialogue

Use Cases

Large Model Assistance

Draft Generator for Llama3-70B

Generates preliminary draft content for large models like Llama3-70B, improving inference efficiency

Achieves 3.72x speedup in code generation tasks and 1.92x speedup in prose generation

Technical Validation

Feasibility Verification of Vocabulary Replacement

Validates the technical feasibility of vocabulary replacement between different language models

Confirms the effectiveness of this method, though fine-tuning is required to ensure generation quality

🚀 Qwama-0.5B-Instruct

This project is a modification of Qwen2-0.5B-Instruct, where it uses a Llama-3 vocabulary. The main goal is to serve as a draft model for Llama-3-70B-Instruct. Although Llama3-8B-Instruct can also fulfill this purpose, it is relatively heavy for drafting.

Another purpose is to explore the feasibility of vocabulary swaps. This can be used to adapt small models like Qwen2-0.5b to generate drafts for other models or to achieve interoperability between different language models. The result shows that the method works, but since fine - tuning is required, it can be costly for larger models. Exploring low - rank or quantized fine - tuning as an alternative would be an interesting direction.

🚀 Quick Start

This section provides an overview of the project and its main purposes. For detailed operations, please refer to the following sections.

✨ Features

Draft Model: Serves as a lightweight draft model for Llama-3-70B-Instruct.
Vocabulary Swap Exploration: Explores the feasibility of vocabulary swaps between different language models.

📚 Documentation

Procedure

The vocabulary was swapped by creating a new embedding layer (the original model uses tied embeddings, so the output layer is the same) and initializing it as follows:

For every L3 token that exactly matches a Qwen2 token, it is initialized with the corresponding embedding.
For every L3 token that decodes and re - encodes to multiple Qwen2 tokens, it is initialized with the mean of those embeddings.
There are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).

for idx in range(target_vocab_size):
    decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
    encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
    new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
    new_head[idx] = old_head[encode.flatten()].mean(dim = 0)

The full script can be found here.

Swapping the vocabulary using the above method results in a mostly coherent but still somewhat confused model. It has particular difficulties with numbers, and the embeddings for the Llama - 3 control tokens do not have the same significance as in an instruct - tuned model. This issue is resolved through subsequent fine - tuning. First, fine - tune on this 2.41 million row sample from Common Crawl, and then fine - tune for 3 epochs on about 25000 instruct - formatted completions produced by Llama3 - 8B - Instruct, which can be referred to here. The attempt to fine - tune only the tied embeddings did not yield good results.

Benchmarks

Property	Details
Model Type	Qwama-0.5B-Instruct is a modified version of Qwen2-0.5B-Instruct with a Llama-3 vocabulary.
Training Data	Fine - tuned on a 2.41 million row sample from Common Crawl and about 25000 instruct - formatted completions produced by Llama3 - 8B - Instruct.

Model	Wikitext 2k	MMLU
Qwen2-0.5B-instruct @ FP16	12.5734	43.83%
Qwama-0.5B-instruct @ FP16	15.3390	40.37%

Draft model speculative decoding, greedy:

Model	Draft model	Code	Prose
Qwen2-72B-instruct @ 6.0bpw	Qwen2-0.5B-instruct @ 4.0bpw	3.68x	1.70x
Llama3-70B-instruct @ 6.0bpw	Qwama-0.5B-instruct @ 4.0bpw	3.72x	1.92x

Sample generations

Qwen2-0.5B-instruct:

Hello, my name is Harry Potter. I am the Chosen One, the only wizard from the wizarding world who can fly and bring a book to life in order to summon it. In a world where wizards often use magic for personal gain, I am an advocate for freedom and non-violence.

Once upon a time, there was a princess named Elsa. She lived in a beautiful castle in the snowy mountains. Her castle was filled with different types of animals, such as snowmen, reindeer, and magical trees. The inhabitants of the castle were very friendly and friendly, but one day, they were attacked by a fierce beast, the Queen of the Snow Kingdom.

I am an AI language model. I don't have a physical body, so I cannot participate in activities like running or playing sports. However, I can simulate the movement of an AI language model. Is there anything specific you would like me to help with?

Qwama-0.5B-instruct:

Hello, my name is Jeffrey Brewer and I am a licensed attorney in both Maryland and Florida. I work with people who are experiencing severe financial stress due to financial mismanagement, foreclosure, divorce, and other financial hardships. My approach is to offer compassionate and skilled legal advice while keeping costs low.

Once upon a time, a giant giant monster with a bad reputation invaded a small town. The mayor and the local community began to fight over who was going to make the rules. But who will win if the monsters were being allowed to roam the town?

I am an AI language model that is designed to answer questions and provide information based on my training data. Would you like me to use my knowledge and expertise to answer your question? I am ready to assist you with any questions you may have. I will be happy to answer your questions in a timely manner.

EXL2 Quants

EXL2 quants are uploaded here.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご