Codellama-7b-hf-ReFT-GSM8k Open-Source Code Model - Powering Seamless Code Generation and Understanding for Effortless Task Completion

Codellama 7b Hf ReFT GSM8k

Developed by lqtrung1998

Enhances the reasoning generalization capabilities of large language models through reinforcement fine-tuning, based on Codellama fine-tuning, suitable for code generation and comprehension tasks.

Large Language Model

Transformers

#Mathematical Reasoning Enhancement #Chain-of-Thought Optimization #Code Generation Fine-tuning

Downloads 38

Release Time : 1/29/2024

Model Overview

The ReFT method improves the performance of large language models on mathematical reasoning tasks through reinforcement fine-tuning, specifically optimized for the GSM8k math problem dataset.

Model Features

Reinforcement Fine-Tuning

Optimizes model performance on mathematical reasoning tasks through reinforcement learning.

Python SDP Chain-of-Thought

Trains the model using Python-structured chain-of-thought format.

Re-ranking Mechanism

Equipped with a dedicated re-ranking model to evaluate the correctness of output reasoning chains.

Model Capabilities

Mathematical Problem Solving

Python Code Generation

Structured Reasoning

Chain-of-Thought Generation

Use Cases

Education

Math Problem Solving

Solves mathematical word problems from the GSM8k dataset.

Achieves 81.2% accuracy on the GSM8k test set.

Programming Assistance

Code Generation

Generates Python solution code based on mathematical problem descriptions.

🚀 ReFT: Reasoning with REinforced Fine-Tuning

ReFT is a method that enhances the generalizability of learning LLMs for reasoning.

🚀 Quick Start

Paper: https://arxiv.org/pdf/2401.08967.pdf
Repo: https://github.com/lqtrung1998/mwp_ReFT (under Apache2.0 License)

✨ Features

We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning. This repository contains several fine - tuned models:

A Warmup Supervised Fine - tuned model on GSM8k benchmark: lqtrung1998/Codellama-7b-hf-SFT-warmup-GSM8k
A Supervised Fine - tuned model on GSM8k benchmark: lqtrung1998/Codellama-7b-hf-SFT-GSM8k
A Rerank model that can score the fine - tuned SFT model output: lqtrung1998/Codellama-7b-hf-SFT-Rerank-GSM8k
A REinforced Fine - tuned model on GSM8k benchmark: lqtrung1998/Codellama-7b-hf-ReFT-GSM8k
A Rerank model that can score the fine - tuned ReFT model output: lqtrung1998/Codellama-7b-hf-ReFT-Rerank-GSM8k

Note: Our models are tuned based on Codellama, thus, licenses applicable to Codellama, such as Llama license, also hold on these models.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

You can use the models through Huggingface's Transformers library or follow scripts in our repo.

Prompt format:

Question:
Weng earns $12 an hour for babysitting. Yesterday, she
just did 50 minutes of babysitting. How much did she earn?
Answer reasoning:

Expected response:

def solution():
  """Weng earns $12 an hour for babysitting. Yesterday, she just did
  50 minutes of babysitting. How much did she earn?"""
  hourly_rate = 12
  minutes_worked = 50
  hours_worked = minutes_worked / 60
  earnings = hourly_rate * hours_worked
  result = earnings
  return result

📚 Documentation

Training Data

The model is trained on GSM8k data with Python SDP CoT format, which can be found here.

Training Procedure

Check out our paper and repo for complete details.

ReFT model

ReFT model is warm - up via Supervised Fine - tuning using GSM8k Python SDP training data for 2 epochs then it is REinforced Fine - tuned for 300 epochs using questions in GSM8k training set.

Rerank model

Rerank model is trained to classify if the output CoT is correct or not using sampling data of ReFT model after 2 epochs warm - up.

Evaluation Results

See evaluations results of the models at table 4 of the research paper.

Updated results:

	Top-1	Voting@100	Rerank@100
Codellama-7b-hf-SFT-warmup-GSM8k	63.00	-	-
Codellama-7b-hf-SFT-GSM8k (+Codellama-7b-hf-SFT-Rerank-GSM8k)	63.68	68.0	77.0
Codellama-7b-hf-ReFT-GSM8k (+Codellama-7b-hf-ReFT-Rerank-GSM8k)	75.28	78.0	81.2

Intended Use

Intended Use Cases Code Llama and its variants is intended for commercial and research use in English and relevant programming languages. The base model Code Llama can be adapted for a variety of code synthesis and understanding tasks, Code Llama - Python is designed specifically to handle the Python programming language, and Code Llama - Instruct is intended to be safer to use for code assistant and generation applications.

Out - of - Scope Uses Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.

Ethical Considerations and Limitations

Code Llama and its variants are a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Code Llama’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Code Llama, developers should perform safety testing and tuning tailored to their specific applications of the model.

Please see the Responsible Use Guide available at https://ai.meta.com/llama/responsible-use-guide.

🔧 Technical Details

This section provides details about the ReFT method and the training process of different models. The ReFT method enhances the generalizability of LLMs for reasoning. The models are fine - tuned on the GSM8k benchmark with specific training procedures for different types of models.

📄 License

The project is under Apache2.0 License. Also, licenses applicable to Codellama, such as Llama license, hold on these models.

📖 Citation

Please cite the paper if you use our data, model or code.

@misc{luong2024reft,
      title={ReFT: Reasoning with Reinforced Fine-Tuning}, 
      author={Trung Quoc Luong and Xinbo Zhang and Zhanming Jie and Peng Sun and Xiaoran Jin and Hang Li},
      year={2024},
      eprint={2401.08967},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご