ZYH-LLM-Qwen2.5-14B-V4 Open-source Large Language Model - Accurate Computation, Powerful Reasoning, Free Deployment!

ZYH LLM Qwen2.5 14B V4

Developed by YOYO-AI

ZYH-LLM-Qwen2.5-14B-V4 is a large language model improved based on Qwen2.5-14B. It enhances computational accuracy and reasoning ability through multi-stage model merging and distillation techniques.

Large Language Model

Safetensors

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Instruction-enhanced reasoning #Multi-stage distillation #Long context support

Downloads 1,235

Release Time : 3/12/2025

Model Overview

While maintaining instruction-following ability and general capabilities, this model improves computational accuracy and reasoning ability by increasing the proportion of the R1 distillation model. It is suitable for various natural language processing tasks.

Model Features

Multi-stage model merging

Adopt a multi-stage merging strategy to combine the advantages of different instruction models and code models

Enhanced reasoning ability

Significantly improve computational accuracy and reasoning ability by increasing the proportion of the R1 distillation model

Long context support

Support long context processing ability of 1 million tokens

Instruction following

Maintain excellent instruction-following ability and generality

Model Capabilities

Text generation

Mathematical calculation

Code understanding and generation

Complex reasoning

Long text processing

Multi-round dialogue

Use Cases

Education

Mathematical problem solving

Solve complex mathematical problems and calculations

Achieved 53.93 points in the MATH Lvl 5 test

Programming

Code generation and explanation

Generate and explain programming code

Research

Scientific problem solving

Answer scientific questions in professional fields

Achieved 8.61 points in the GPQA test

🚀 ZYH-LLM-Qwen2.5-14B-V4

*ZYH-LLM-Qwen2.5-14B-V4 increases the proportion of the R1 distillation model in the model merging recipe while maintaining the model's instruction-following ability and *general capabilities.

image/jpeg

🚀 Quick Start

The fifth-generation model of ZYH-LLM-Qwen2.5 has been released! Check it out here.

✨ Features

Increase the proportion of the R1 distillation model in the model merging recipe.
Maintain the model's instruction-following ability and general capabilities.
Improve the calculation accuracy and inference ability of the model without reducing the general capabilities of the instruction model.

📚 Documentation

Merge Template

merge_method: model_stock  
base_model: Instruction Model  
models:  
  - model: Instruction Fine-tuning Model 1  
  - model: Instruction Fine-tuning Model 2  
  - model: Inference Fine-tuning Model 1  
  - model: Inference Fine-tuning Model 2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true

Using the above template for merging can improve the calculation accuracy and inference ability of the model without reducing the general capabilities of the instruction model. ZYH-LLM-Qwen2.5-V4 used this template during the model merging process.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Property	Details
Avg.	43.14
IFEval (0-Shot)	83.65
BBH (3-Shot)	50.27
MATH Lvl 5 (4-Shot)	53.93
GPQA (0-shot)	8.61
MuSR (0-shot)	15.66
MMLU-PRO (5-shot)	46.71

Model Merging Stages

First stage

Create four different instruction models and code model.

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-base

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/Virtuoso-Small-v2  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-v2

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/SuperNova-Medius  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-Nova

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Azure99/Blossom-V6-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-V6

models:  
  - model: Qwen/Qwen2.5-Coder-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-Coder-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-Coder-14B-della

Second stage

Step 1

Create three instruction models with a bias towards reasoning by using templates.

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-Coder-14B-della  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Coder

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-V6  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-V6

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Nova

Step 2

Create a pure instruction model to restore the generality of the final model.

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: Qwen2.5-14B-della-V6   
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-it

Third stage

Create a base model with a context of 1 million tokens.

merge_method: sce  
models:
  # Pivot model
  - model: Qwen/Qwen2.5-14B-Instruct-1M
  # Target models  
  - model: Qwen/Qwen2.5-14B  
base_model: Qwen/Qwen2.5-14B-Instruct-1M  
parameters:  
  select_topk: 1  
dtype: bfloat16  
tokenizer_source: base  
normalize: true  
int8_mask: true  
name: Qwen2.5-14B-1M

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen2.5-14B-1M  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-1M

Final stage

merge_method: model_stock  
base_model: Qwen2.5-14B-della-1M  
models:  
  - model: Qwen2.5-14B-mst-Coder  
  - model: Qwen2.5-14B-mst-V6  
  - model: Qwen2.5-14B-mst-Nova  
  - model: Qwen2.5-14B-mst-it  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: ZYH-LLM-Qwen2.5-14B-V4

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご