ThinkEdit-deepseek-qwen-14b Open-source Model - Optimize the Inference Thinking Chain and Improve Inference Accuracy

Thinkedit Deepseek Qwen 14b

Developed by cesun

ThinkEdit is a lightweight weight editing method that identifies and edits a small number of attention heads to mitigate the issue of large language models generating overly short reasoning chains in inference tasks, thereby improving reasoning accuracy.

Large Language Model

Transformers

Open Source License:Other #Reasoning Optimization #Weight Editing #Mathematical Reasoning

Downloads 46

Release Time : 3/14/2025

Model Overview

This model is based on deepseek-qwen-14b and focuses on addressing the accuracy decline caused by models generating overly short reasoning chains. Through interpretable weight editing techniques, it significantly enhances performance in tasks such as mathematical reasoning.

Model Features

Lightweight Weight Editing

Edits only about 0.1% of total parameters, achieving performance improvement by modifying a small number of attention heads.

Short Reasoning Mitigation

Specifically optimized to address the issue of models generating overly short reasoning chains.

Interpretability

Can identify approximately 2% of 'short reasoning' attention heads with clear editing directions.

Performance Improvement

Significantly improves accuracy on multiple mathematical reasoning datasets, especially in cases of short reasoning.

Model Capabilities

Mathematical Problem Solving

Complex Reasoning Task Handling

Reasoning Chain Generation

Educational Applications

Use Cases

Education

Math Problem Solving

Solves math problems from elementary to high school difficulty levels.

Achieves 93.5% accuracy on the GSM8K dataset.

Academic Assessment

Used for elementary math evaluation in MMLU.

Accuracy improved to 96.53%.

Research

Model Behavior Research

Studies the behavior patterns of large language models in reasoning tasks.

Identifies specific attention heads responsible for short reasoning.

🚀 ThinkEdit-deepseek-qwen-14b

ThinkEdit is a lightweight weight-editing method that enhances the performance of reasoning models by addressing the issue of overly short reasoning.

🚀 Quick Start

This README provides information about the ThinkEdit models, including their performance results and usage instructions.

✨ Features

Lightweight Editing: Identifies a small percentage of "short reasoning" attention heads and edits only a tiny fraction of total parameters.
Performance Boost: Significantly improves the accuracy of reasoning models, especially on cases with short reasoning traces.

📚 Documentation

Repository Information

Repository for: ThinkEdit-deepseek-qwen-14b
Authors: Chung-En Sun, Ge Yan, Tsui-Wei Weng
Paper: ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Code: https://github.com/Trustworthy-ML-Lab/ThinkEdit

Introduction

Reasoning-augmented models sometimes fail by generating overly short, abstract chain-of-thought (CoT) reasoning, hurting their accuracy. ThinkEdit is a lightweight weight-editing method that:

Identifies ~2% of "short reasoning" attention heads
Edits only ~0.1% of total parameters
Removes the "short reasoning" direction from their output
Boosts performance, especially on cases with short reasoning traces

Full Performance Results

1. Overall Accuracy

Model	GSM8K	MMLU Elementary Math	MATH-Level1	MATH-Level5	MATH-500
deepseek-qwen-14b	90.80 ± 0.36	95.08 ± 0.65	96.32 ± 0.35	90.25 ± 0.72	91.48 ± 0.55
ThinkEdit-deepseek-qwen-14b	93.50 ± 0.31	96.53 ± 0.54	96.50 ± 0.46	91.15 ± 0.59	91.78 ± 0.58
deepseek-llama3-8b	82.26 ± 0.91	96.01 ± 0.62	93.46 ± 0.84	85.49 ± 0.83	87.26 ± 1.16
ThinkEdit-deepseek-llama3-8b	88.97 ± 0.78	96.08 ± 0.86	94.12 ± 0.47	85.91 ± 0.48	87.60 ± 0.81
deepseek-qwen-1.5b	79.15 ± 1.08	68.52 ± 1.56	93.00 ± 0.33	75.48 ± 0.90	82.22 ± 1.29
ThinkEdit-deepseek-qwen-1.5b	83.34 ± 0.79	86.24 ± 1.12	93.89 ± 0.76	74.94 ± 0.85	82.74 ± 0.77

2. Accuracy on Short Reasoning Cases (Top 5% / 10% / 20%)

Model	GSM8K	MMLU Elementary Math	MATH-Level1	MATH-Level5	MATH-500
deepseek-qwen-14b	96.31 / 95.65 / 92.93	93.89 / 96.22 / 95.60	99.52 / 99.30 / 97.70	89.39 / 94.32 / 96.25	86.40 / 91.40 / 93.50
ThinkEdit-deepseek-qwen-14b	96.62 / 96.03 / 96.12	96.11 / 96.22 / 96.27	100.00 / 99.77 / 98.85	95.76 / 97.65 / 98.07	89.60 / 92.60 / 94.70
deepseek-llama3-8b	88.92 / 87.18 / 85.82	97.22 / 96.49 / 96.80	97.14 / 94.88 / 94.83	78.64 / 88.79 / 93.41	82.00 / 81.40 / 88.30
ThinkEdit-deepseek-llama3-8b	97.08 / 95.27 / 93.95	97.78 / 98.65 / 97.87	100.00 / 99.30 / 98.62	95.61 / 96.89 / 97.12	92.80 / 93.60 / 94.40
deepseek-qwen-1.5b	88.46 / 87.48 / 85.02	62.78 / 62.16 / 60.53	97.62 / 95.12 / 93.91	91.52 / 95.00 / 95.72	82.40 / 89.80 / 93.40
ThinkEdit-deepseek-qwen-1.5b	92.46 / 92.37 / 92.05	77.22 / 80.54 / 79.73	96.19 / 95.81 / 97.36	93.79 / 95.83 / 95.80	92.80 / 94.40 / 94.90

3. Reasoning Lengths (Top 5% / 10% / 20% Shortest Responses)

Model	GSM8K	MMLU Elementary Math	MATH-Level1	MATH-Level5	MATH-500
deepseek-qwen-14b	76.6 / 86.5 / 99.1	65.8 / 72.2 / 80.6	93.7 / 114.3 / 188.6	628.8 / 858.4 / 1125.9	198.7 / 434.3 / 697.0
ThinkEdit-deepseek-qwen-14b	95.4 / 106.3 / 120.2	79.1 / 87.1 / 98.7	125.1 / 150.2 / 243.4	698.5 / 906.6 / 1157.2	270.2 / 492.6 / 733.3
deepseek-llama3-8b	73.0 / 83.1 / 96.6	371.0 / 438.1 / 518.2	80.3 / 97.2 / 130.3	617.9 / 854.9 / 1126.5	159.5 / 357.5 / 644.5
ThinkEdit-deepseek-llama3-8b	93.2 / 106.9 / 127.4	396.5 / 464.2 / 543.2	137.4 / 173.3 / 277.1	791.2 / 954.8 / 1185.1	305.2 / 506.3 / 737.6
deepseek-qwen-1.5b	78.8 / 89.4 / 103.0	61.6 / 68.5 / 77.6	88.8 / 110.3 / 219.7	804.6 / 1017.9 / 1314.0	249.7 / 506.5 / 760.7
ThinkEdit-deepseek-qwen-1.5b	97.2 / 109.4 / 126.3	75.9 / 85.0 / 99.5	127.9 / 174.1 / 416.4	818.0 / 984.5 / 1214.3	435.0 / 612.9 / 800.6

Usage

The usage of ThinkEdit models is exactly the same as the original deepseek-distilled models.

Citation

@misc{sun2025thinkedit,
      title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models}, 
      author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
      year={2025},
      eprint={2503.22048},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.22048}, 
}

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご