Qwen2.5-7B-YOYO-super Open-Source Large Language Model - Free Deployment with Enhanced Instruction, Math, and Coding Abilities

Qwen2.5 7B YOYO Super

Developed by YOYO-AI

Qwen2.5-7B-YOYO-super is an optimized open-source large language model achieved by merging base models and fine-tuned models, focusing on enhancing instruction-following, mathematical, and coding capabilities.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multi-model fusion #Instruction optimization #Bilingual (Chinese-English)

Downloads 17

Release Time : 4/6/2025

Model Overview

This model significantly improves instruction-following, mathematical, and coding capabilities by merging the Qwen2.5-7B base model with multiple fine-tuned versions.

Model Features

Optimized model merging

Significantly enhances the model's overall performance by merging base models with multiple fine-tuned models.

Retention of base model knowledge

The new merging formula better preserves the base model's knowledge, reducing degradation in mathematical and coding capabilities.

Multi-method merging

Combines various merging methods like della and ties to ensure comprehensive performance improvements.

Model Capabilities

Text generation

Instruction-following

Mathematical reasoning

Code generation

Use Cases

Natural Language Processing

Dialogue systems

Can be used to build intelligent dialogue systems for smooth interactive experiences.

Code assistance

Helps developers generate and optimize code snippets.

Education

Mathematical problem-solving

Solves mathematical problems and provides detailed reasoning processes.

🚀 Achieve the Optimal Merged Model by Using One Basic Model and Two Fine-tuned Models!

This project aims to find the best way to merge one base model and two fine-tuned models, presenting the optimal results from numerous merging experiments.

🚀 Quick Start

The optimal merged models are available at the following links:

✨ Features

Previous Generation Formula

models:  
  - model: Qwen/Qwen2.5-7B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-7B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-7B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base

This formula was widely used in the merging process of the previous generation of models, but it has some deficiencies:

There is relatively little retention of knowledge of the basic model.
The mathematical and coding abilities have declined.

Current Generation Formula

models:
  - model: Qwen/Qwen2.5-7B-instruct
    parameters:
      density: 1 
      weight: 1
      lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  lambda: 0.9
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della

models:
  - model: Qwen/Qwen2.5-7B-instruct-1M
    parameters:
      density: 1 
      weight: 1
      lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  lambda: 0.9
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-della-1M

models:
  - model: Qwen/Qwen2.5-7B-instruct
    parameters:
      density: 1 
      weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties

models:
  - model: Qwen/Qwen2.5-7B-instruct-1M
    parameters:
      density: 1 
      weight: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
  density: 1
  weight: 1
  normalize: true
  int8_mask: true
dtype: float16
tokenizer_source: base
name: Qwen2.5-7B-ties-1M

merge_method: model_stock
base_model: Qwen/Qwen2.5-7B
models:
  - model: mergekit-community/Qwen2.5-7B-della
  - model: mergekit-community/Qwen2.5-7B-della-1M
  - model: mergekit-community/Qwen2.5-7B-ties
  - model: mergekit-community/Qwen2.5-7B-ties-1M
  - model: Qwen/Qwen2.5-7B-instruct-1M
  - model: Qwen/Qwen2.5-7B-instruct
tokenizer_source: base
int8_mask: true
normalize: true
dtype: float16

Except for a slight decrease in instruction following, significant improvements have been achieved in all other aspects. This formula will also be used in the development of the next generation of YOYO models.

📚 Documentation

YOYO-AI not only releases merged models with excellent performance but also publishes a complete and high-quality model merging formula, hoping to promote the progress of model merging technology in the open-source community with this!

Support

If you can use this formula when merging models, it will be the greatest support for YOYO-AI!

📄 License

The project is licensed under the apache-2.0 license.

📦 Model Information

Property	Details
Base Model	mergekit-community/Qwen2.5-7B-della, mergekit-community/Qwen2.5-7B-ties, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-7B-Instruct-1M, mergekit-community/Qwen2.5-7B-ties-1M, Qwen/Qwen2.5-7B, mergekit-community/Qwen2.5-7B-della-1M
Library Name	transformers
Tags	mergekit, merge
License	apache-2.0
Language	en, zh
Pipeline Tag	text-generation

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご