đ Achieve the Optimal Merged Model by Using One Basic Model and Two Fine-tuned Models!
This project aims to find the best way to merge one base model and two fine-tuned models, presenting the optimal results from numerous merging experiments.
đ Quick Start
The optimal merged models are available at the following links:
⨠Features
Previous Generation Formula
models:
- model: Qwen/Qwen2.5-7B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-7B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
This formula was widely used in the merging process of the previous generation of models, but it has some deficiencies:
- There is relatively little retention of knowledge of the basic model.
- The mathematical and coding abilities have declined.
Current Generation Formula
models:
- model: Qwen/Qwen2.5-7B-instruct
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della
models:
- model: Qwen/Qwen2.5-7B-instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della-1M
models:
- model: Qwen/Qwen2.5-7B-instruct
parameters:
density: 1
weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties
models:
- model: Qwen/Qwen2.5-7B-instruct-1M
parameters:
density: 1
weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
density: 1
weight: 1
normalize: true
int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties-1M
merge_method: model_stock
base_model: Qwen/Qwen2.5-7B
models:
- model: mergekit-community/Qwen2.5-7B-della
- model: mergekit-community/Qwen2.5-7B-della-1M
- model: mergekit-community/Qwen2.5-7B-ties
- model: mergekit-community/Qwen2.5-7B-ties-1M
- model: Qwen/Qwen2.5-7B-instruct-1M
- model: Qwen/Qwen2.5-7B-instruct
tokenizer_source: base
int8_mask: true
normalize: true
dtype: float16
Except for a slight decrease in instruction following, significant improvements have been achieved in all other aspects. This formula will also be used in the development of the next generation of YOYO models.
đ Documentation
YOYO-AI not only releases merged models with excellent performance but also publishes a complete and high-quality model merging formula, hoping to promote the progress of model merging technology in the open-source community with this!
Support
If you can use this formula when merging models, it will be the greatest support for YOYO-AI!
đ License
The project is licensed under the apache-2.0
license.
đĻ Model Information
Property |
Details |
Base Model |
mergekit-community/Qwen2.5-7B-della, mergekit-community/Qwen2.5-7B-ties, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-7B-Instruct-1M, mergekit-community/Qwen2.5-7B-ties-1M, Qwen/Qwen2.5-7B, mergekit-community/Qwen2.5-7B-della-1M |
Library Name |
transformers |
Tags |
mergekit, merge |
License |
apache-2.0 |
Language |
en, zh |
Pipeline Tag |
text-generation |