T

Thinkedit Deepseek Qwen 14b

Developed by cesun
ThinkEdit is a lightweight weight editing method that identifies and edits a small number of attention heads to mitigate the issue of large language models generating overly short reasoning chains in inference tasks, thereby improving reasoning accuracy.
Downloads 46
Release Time : 3/14/2025

Model Overview

This model is based on deepseek-qwen-14b and focuses on addressing the accuracy decline caused by models generating overly short reasoning chains. Through interpretable weight editing techniques, it significantly enhances performance in tasks such as mathematical reasoning.

Model Features

Lightweight Weight Editing
Edits only about 0.1% of total parameters, achieving performance improvement by modifying a small number of attention heads.
Short Reasoning Mitigation
Specifically optimized to address the issue of models generating overly short reasoning chains.
Interpretability
Can identify approximately 2% of 'short reasoning' attention heads with clear editing directions.
Performance Improvement
Significantly improves accuracy on multiple mathematical reasoning datasets, especially in cases of short reasoning.

Model Capabilities

Mathematical Problem Solving
Complex Reasoning Task Handling
Reasoning Chain Generation
Educational Applications

Use Cases

Education
Math Problem Solving
Solves math problems from elementary to high school difficulty levels.
Achieves 93.5% accuracy on the GSM8K dataset.
Academic Assessment
Used for elementary math evaluation in MMLU.
Accuracy improved to 96.53%.
Research
Model Behavior Research
Studies the behavior patterns of large language models in reasoning tasks.
Identifies specific attention heads responsible for short reasoning.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase